Commit Graph

16 Commits

Author SHA1 Message Date
jordan
cde30b9213 chore: apply rustfmt formatting across API handlers and core types
Reformats import blocks, function signatures, and expression line wrapping
in stemedb-api handlers, stemedb-core serde/source_record, and serde_helpers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 16:43:45 -07:00
jordan
ad07a75d0a feat: add source content to source registry, signed assertions, feed endpoint, dashboard enhancements
- Add `content: Option<String>` to SourceRecord with rkyv schema evolution
  (LegacySourceRecord compat deserializer for backward compatibility)
- Add MAX_SOURCE_CONTENT_LEN (1MB) limit with API validation
- Strip content from list responses, include in single-source GET
- Update Go SDK RegisterSourceRequest with Content field
- FCM pipeline extracts PDF text via pdftotext and passes to registration
- Dashboard impact panel fetches and displays source content with expand/collapse
- Add feed endpoint, dashboard feed panel, and signed assertion support
- Update data-structures.md, API docs, and storage docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:54:27 -07:00
jml
e95c978481 feat(aphoria): add inline claim markers and claim enrichment infrastructure
This commit implements Phase 17 of the Aphoria roadmap, adding:

**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)

**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)

**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns

**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion

**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries

**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion

Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 20:18:20 +00:00
jordan
bbe6aedc40 feat: Aphoria security extractors + LLM evaluation architecture + ontology docs
New security extractors:
- insecure_deserialization, orm_injection, path_traversal, security_headers
- ssrf, unvalidated_redirects, weak_password, xxe
- Enhanced tls_version extractor with comprehensive cipher/protocol checks

Architecture docs:
- Scout-judge extraction pattern for LLM-based code analysis
- LLM prompt evaluation framework
- LLM eval implementation guide

Core improvements:
- stemedb-ontology README and client enhancements
- WAL journal/segment instrumentation
- Signing and ingestion refinements
- Consumer health demo script

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:22:55 -07:00
jordan
41c676a78e feat: Aphoria enterprise features + ontology SDK + file length compliance
Enterprise Features:
- Hosted mode with remote sync for team pattern aggregation
- Community sharing with privacy-preserving anonymization
- LLM-based semantic claim extraction with Gemini integration
- Pattern learning with promotion to declarative extractors
- High-entropy secrets extractor with configurable thresholds
- Auth bypass and insecure cookies extractors

Module Refactoring:
- Split oversized files to comply with 500-line limit
- Config split: types/core.rs, types/extractors.rs, types/hosted.rs, etc.
- Handlers split: scan.rs, policy.rs, report.rs modules
- Extractors split: declarative/, high_entropy_secrets/, insecure_cookies/
- Learning split: store modules with metrics and persistence

SDK & Ontology:
- stemedb-ontology SDK with fluent builders and StemeDB client
- Pharma domain extractors for FDA Orange Book data
- Consumer health UAT test infrastructure

Code Quality:
- Fixed clippy warnings (needless_borrows_for_generic_args)
- Added KVStore trait imports where needed
- Fixed utoipa path re-exports for OpenAPI docs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 12:55:29 -07:00
jordan
b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00
jordan
a734be3a0d feat: Phase 7 Content Defense + code structure refactoring
Content Defense (Phase 7):
- Add SimilarityIndex with MinHash/LSH for near-duplicate detection
- Add QuarantineStore for flagged assertions awaiting admin review
- Add CircuitBreakerStore for per-agent circuit breaker state
- Add ContentDefenseLayer for ingestion pipeline integration
- Add API endpoints for quarantine and circuit breaker management
- Add research module with gap detection and documentation fetching

Code Structure Improvements:
- Extract research CLI commands to research_commands.rs
- Extract API routers to routers.rs module
- Extract key_codec extraction functions to separate module
- Extract test modules to separate files across multiple crates
- All files now under 500 line limit per pre-commit hook

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:44:05 -07:00
jordan
d3a88585fe feat: Phase 6 UAT - Admission control, HLC recency, cluster coordination
This commit includes comprehensive work on Phase 6 features:

## Admission Control (Phase 6 admission middleware)
- AdmissionStore implementation backed by TrustRankStore
- PoW verification with tier-based difficulty computation
- Trust tier progression (Newcomer → Established → Trusted → Authority)
- API integration with admission status endpoints

## HLC Recency Lens (Phase 6C)
- HlcRecencyLens for distributed system ordering
- Hybrid logical clock integration with causality preservation

## Cluster Coordination (Phase 6C)
- Multi-node cluster tests (availability, partition tolerance)
- CRDT convergence tests for anti-entropy sync
- Gateway handler improvements

## Aphoria Code Linter (Phase 2A)
- RFC/OWASP corpus builders with network fetching and caching
- Concept hierarchy with auto-alias creation on conflict detection
- Multiple security extractors (TLS, JWT, CORS, secrets, rate limiting)

## Code Organization
- Split large files into modules to comply with 500-line limit
- Improved test organization with separate test modules
- Fixed rkyv serialization for EigenTrustState (AgentScore struct)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 00:43:37 -07:00
jordan
2b0923f20e feat: Distributed replication foundation (Phase 6A) - HLC, Merkle trees, CRDT stores, sync protocol
- Add Hybrid Logical Clock (HLC) for causality tracking across nodes
- Implement Merkle tree for efficient diff/sync with BLAKE3 hashing
- Add CRDT-aware stores for assertions and votes with vector clocks
- Create stemedb-sync crate with anti-entropy and gossip protocols
- Add stemedb-rpc crate with gRPC sync service (proto definitions)
- Implement SupersessionChain for tracking assertion lifecycles
- Add Aphoria application for code analysis/reporting
- Add battery11 replication test scaffolding
- Fix .gitignore to exclude nested target directories

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:31:54 -07:00
jordan
137a588ed0 feat: Concept hierarchy (Phase 5D) - ConceptPath, source schemes, AliasStore
Implements hierarchical subject identifiers with scheme-based source tier inference:

- ConceptPath type with parse/wire_format, leaf/parent, prefix matching
- SourceScheme registry mapping schemes to default SourceClass tiers:
  - rfc://, fda://, ietf:// → Regulatory (Tier 0)
  - peer://, pubmed:// → PeerReviewed (Tier 1)
  - code://, wiki:// → Expert (Tier 3)
  - blog://, anon:// → Anecdotal (Tier 5)
- AliasStore for cross-scheme entity resolution (bidirectional indexing)
- API endpoints for concept operations
- Battery tests 8, 9 & 10 for concepts, aliases, and advanced signatures
- Go SDK updates for concept types and signing

Completes Phase 5, advancing to Phase 6 (Distributed Writes).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 17:44:54 -07:00
jordan
55349845d0 refactor: Split all files to enforce 500-line max
Break monolith source files into focused modules:
- stemedb-core/types.rs → types/ directory (assertion, source, gold_standard, etc.)
- stemedb-storage: audit_store, quota_store, trust_rank_store, vector_index, vote_store → module directories
- stemedb-ingest/worker.rs → worker/ with separate test modules
- stemedb-query: engine, materializer, query → module directories
- stemedb-lens: epoch_aware, skeptic → module directories
- stemedb-sim/lib.rs → agent, arenas/, helpers, runner, strategy, types
- stemedb-api/tests: integration_tests → http_basic, http_validation, http_epoch, http_pipeline
- stemedb-api/tests: e2e_flow_test → e2e_full_pipeline, e2e_lens_resolution
- stemedb-query/tests: e2e_pipeline → e2e_pipeline + e2e_decay

Also adds new features: gold standard verification, escalation handlers,
admin endpoints, concept hierarchy spec, arena roadmap, and Go SDK.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 01:13:45 -07:00
jordan
c59066949a feat: Add quickstart "Beyond Hello World" sections with Skeptic and Layered endpoints
- Add Layered() method to Go SDK for per-source-class consensus queries
- Add LayeredQueryParams, LayeredResult, TierResolution types to Go SDK
- Create conflict example demonstrating Skeptic and Layered endpoints
- Update quickstart.md with sections 6 (conflict detection) and 7 (authority tiers)
- Remove tracked Go binary and add data/ to .gitignore

The new quickstart sections demonstrate Episteme's differentiating features:
- Skeptic endpoint shows "Trust but Verify" conflict analysis
- Layered endpoint shows per-tier resolution (Clinical vs Anecdotal)

Note: Pre-existing large files flagged by pre-commit hook (technical debt from prior sessions)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 21:00:59 -07:00
jordan
152df4b0b4 docs: Mark Phase 2.4, 2.5, 2.6 as complete in roadmap
- 2.4 Visual Hash Query: hamming_distance, visual_near/threshold implemented
- 2.5 Vector Field: N/A (Phase 3 work, scaffolding correct)
- 2.6 E2E Integration Test: e2e_pipeline.rs with 5 comprehensive tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 13:33:03 -07:00
jordan
1ce4004807 feat: Complete Phase 2 (The Cortex) - query, lens, and API layers
This commit adds the read path (Cortex) to complement the write path (Spine):

## Crates
- stemedb-api: HTTP API with axum + utoipa OpenAPI
  - /v1/assert, /v1/query, /v1/epoch, /v1/skeptic, /v1/trace, /v1/audit
  - Metered endpoints with quota enforcement
  - Ed25519 signature verification
- stemedb-lens: Truth resolution lenses
  - RecencyLens, ConsensusLens, ConfidenceLens
  - VoteAwareConsensusLens (Ballot Box pattern)
  - TrustAwareAuthorityLens (The Hive pattern)
  - SkepticLens (conflict analysis)
  - EpochAwareLens (paradigm-safe queries)
- stemedb-query: Query engine with materialized views

## Storage Extensions
- VoteStore: Vote aggregation with cached counts
- TrustRankStore: Agent reputation with decay
- AuditStore: Query audit trail
- IndexStore: SP/P/S index structures
- SupersessionStore: Epoch supersession chains

## SDKs
- sdk/go/steme: Go HTTP client with Ed25519 signing
- sdk/go/adk: ADK-Go tools for AI agents

## Documentation
- Updated CLAUDE.md, architecture.md, roadmap.md
- New ai-lookup entries for all services
- Use case docs for consumer health intelligence
- Arena roadmap for simulation advancement

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 13:22:44 -07:00
jordan
3cfaa1e1d3 feat: Complete Phase 1 (The Spine) - storage foundation
Phase 1 delivers the complete durability and storage layer:

- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
  fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
  aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
  and multi-cycle durability

New crates: stemedb-wal, stemedb-storage, stemedb-ingest

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 14:15:34 -07:00
jordan
a776744889 Initial project setup with Claude Code monorepo structure
- Rust workspace with stemedb-core crate
- Full .claude/ configuration (agents, skills, commands, guides)
- ai-lookup/ for token-efficient fact storage
- Quality gates: clippy, fmt, jscpd duplication detection
- Pre-commit hook with 5-phase quality checks
- CLAUDE.md router and CODING_GUIDELINES.md standards

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 10:56:26 -07:00