stemedb

jordan/stemedb

Fork 0

Commit Graph

Author	SHA1	Message	Date
jordan	02ecac9a07	fix: merge upstream 10 commits, fix DashMap deadlock, deterministic sim ingestion Merged 10 upstream commits (MemTable, read-your-writes tests, feed endpoint, security hardening, signed assertions, source registry, dashboard enhancements) and fixed all test failures across the full workspace (2656/2656 passing). Key fixes: - fix(cluster): DashMap deadlock in swim.rs suspect_node/fail_node/alive_node - DashMap::get_mut RefMut + iter() on same map = non-reentrant write lock deadlock - Fix: extract clone in scoped block to drop RefMut before calling update_node_gauges() - 6 previously-hanging SWIM tests now pass in <2s - fix(sim): replace background-task+polling ingestion with synchronous process_pending() - smoke_high_volume_simulation was CPU-starved under 2656 parallel tests - Removed ingestor.start() + wait_until_ingested() pattern throughout sim - All arena functions now call ingestor.process_pending() directly (deterministic) - fix(test): v2 signature helper used wrong hash (rkyv vs canonical compute_content_hash_v2) - fix(test): quota test signed "test" but v1 requires "subject:predicate" format - fix(test): http_validation now accepts 400 for valid-format-but-invalid-crypto hex - fix(test): scale_adaptive micro tier assertions updated (auto_promote upstream change) - config: add nextest.toml with slow-timeout for background-task-tests group Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-20 20:27:32 -07:00
jml	3e7eddc074	feat: add enterprise production readiness infrastructure This commit implements comprehensive production hardening across multiple layers to prepare StemeDB for enterprise pilot deployments: ## API Layer - Add rate limiting middleware with configurable limits per endpoint - Enhance error handling with detailed context and proper HTTP status codes - Add security hardening tests for input validation and boundary conditions - Create store_helpers module for defensive storage access patterns ## Storage & WAL - Optimize group commit batching for higher throughput - Add defensive error handling in hybrid backend with proper fallbacks - Enhance WAL journal durability guarantees with fsync validation - Improve index store query performance with better caching ## Operations & Deployment - Add comprehensive operations documentation (deployment, monitoring, DR) - Create systemd units for backup, WAL archival, and verification - Add monitoring configs (Prometheus alerts, metrics exporters) - Implement backup/restore scripts with verification and S3 archival - Add DR drill automation and runbook procedures - Create load balancer configs (nginx, envoy) with health checks ## Documentation - Update CLAUDE.md with operations and troubleshooting guides - Expand roadmap with production readiness milestones - Add pilot success criteria and deployment reference architecture - Document TLS setup, monitoring integration, and incident response ## Configuration - Add .env.example with all required environment variables - Document resource sizing for different deployment scales - Add configuration examples for various deployment topologies This positions StemeDB for successful enterprise pilots with proper operational discipline, monitoring, backup/DR, and security hardening. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-12 06:08:15 +00:00

Author

SHA1

Message

Date

jordan

02ecac9a07

fix: merge upstream 10 commits, fix DashMap deadlock, deterministic sim ingestion

Merged 10 upstream commits (MemTable, read-your-writes tests, feed endpoint,
security hardening, signed assertions, source registry, dashboard enhancements)
and fixed all test failures across the full workspace (2656/2656 passing).

Key fixes:
- fix(cluster): DashMap deadlock in swim.rs suspect_node/fail_node/alive_node
  - DashMap::get_mut RefMut + iter() on same map = non-reentrant write lock deadlock
  - Fix: extract clone in scoped block to drop RefMut before calling update_node_gauges()
  - 6 previously-hanging SWIM tests now pass in <2s
- fix(sim): replace background-task+polling ingestion with synchronous process_pending()
  - smoke_high_volume_simulation was CPU-starved under 2656 parallel tests
  - Removed ingestor.start() + wait_until_ingested() pattern throughout sim
  - All arena functions now call ingestor.process_pending() directly (deterministic)
- fix(test): v2 signature helper used wrong hash (rkyv vs canonical compute_content_hash_v2)
- fix(test): quota test signed "test" but v1 requires "subject:predicate" format
- fix(test): http_validation now accepts 400 for valid-format-but-invalid-crypto hex
- fix(test): scale_adaptive micro tier assertions updated (auto_promote upstream change)
- config: add nextest.toml with slow-timeout for background-task-tests group

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-20 20:27:32 -07:00

jml

3e7eddc074

feat: add enterprise production readiness infrastructure

This commit implements comprehensive production hardening across multiple
layers to prepare StemeDB for enterprise pilot deployments:

## API Layer
- Add rate limiting middleware with configurable limits per endpoint
- Enhance error handling with detailed context and proper HTTP status codes
- Add security hardening tests for input validation and boundary conditions
- Create store_helpers module for defensive storage access patterns

## Storage & WAL
- Optimize group commit batching for higher throughput
- Add defensive error handling in hybrid backend with proper fallbacks
- Enhance WAL journal durability guarantees with fsync validation
- Improve index store query performance with better caching

## Operations & Deployment
- Add comprehensive operations documentation (deployment, monitoring, DR)
- Create systemd units for backup, WAL archival, and verification
- Add monitoring configs (Prometheus alerts, metrics exporters)
- Implement backup/restore scripts with verification and S3 archival
- Add DR drill automation and runbook procedures
- Create load balancer configs (nginx, envoy) with health checks

## Documentation
- Update CLAUDE.md with operations and troubleshooting guides
- Expand roadmap with production readiness milestones
- Add pilot success criteria and deployment reference architecture
- Document TLS setup, monitoring integration, and incident response

## Configuration
- Add .env.example with all required environment variables
- Document resource sizing for different deployment scales
- Add configuration examples for various deployment topologies

This positions StemeDB for successful enterprise pilots with proper
operational discipline, monitoring, backup/DR, and security hardening.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-12 06:08:15 +00:00

2 Commits