**Git Commit Tracking** - Automatically capture git commit hash when claims/observations are ingested - Store in assertion metadata for temporal context and audit trails - Graceful degradation in non-git environments - Solves double-commit problem by capturing hash at ingestion time **Implementation** - walker/git.rs: get_current_commit_hash() utility function - bridge.rs: Accept optional git_commit parameter in all conversion functions - episteme/local: Store project_root, capture git hash during ingestion - 5 new tests for git hash tracking + metadata validation - All 1162 aphoria tests passing **Documentation Overhaul** - README: Added Observations vs Claims distinction, git tracking, dashboard - CLI Reference: New sections for git integration and ignore/exclusion system - Comprehensive ignore documentation: .aphoriaignore, inline comments, 4 methods - Enhanced verification engine docs with matching capabilities - DOCUMENTATION_UPDATES.md: Complete audit summary **Dashboard Separation** - Moved Aphoria-specific UI from stemedb-dashboard to aphoria-dashboard - Clean separation of concerns: StemeDB for core, Aphoria for security - Added dashboard documentation and setup guides Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
5.3 KiB
Corpus Feature Status
Summary
✅ Corpus API works - /v1/aphoria/patterns endpoint is functional
✅ Community sharing configured - Maxwell has community.enabled = true
✅ Hosted mode configured - Points to http://localhost:18180
✅ Observations pushed - 66 observations successfully sent to API
❌ Corpus still empty - Observations went to wrong endpoint
Root Cause
The HostedClient posts observations to:
POST /v1/aphoria/observations
But the corpus aggregator reads from:
POST /v1/aphoria/community/observations
GET /v1/aphoria/patterns
Two different endpoints:
/v1/aphoria/observations- Stores observations as assertions (for hosted team mode)/v1/aphoria/community/observations- Aggregates into pattern corpus (for community patterns)
What Happened
When we ran:
aphoria scan --persist --sync --config /home/jml/Workspace/maxwell/.aphoria/config.toml /home/jml/Workspace/maxwell
Logs showed:
[INFO] Using persistent mode (with Episteme storage) sync=true hosted=true
[INFO] Pushed observations to hosted server accepted=66 deduplicated=0
Observations were successfully pushed to /v1/aphoria/observations
But corpus reads from /v1/aphoria/community/observations ❌
Architecture Gap
There are two separate features that got conflated:
1. Hosted Mode (Team Server)
- Purpose: Teams run private StemeDB server
- Endpoint:
/v1/aphoria/observations - What it does: Stores observations for team projects
- Use case: "Acme Corp runs StemeDB for all their projects"
2. Community Corpus (Public Patterns)
- Purpose: Anonymized pattern aggregation across projects
- Endpoint:
/v1/aphoria/community/observations - What it does: Aggregates by (subject, predicate, value), counts projects
- Use case: "Show me what 100+ projects do for TLS settings"
Current State
Observations Storage:
curl http://localhost:18180/v1/aphoria/observations
# Returns: 66 observations stored as assertions
Corpus (Pattern Aggregates):
curl http://localhost:18180/v1/aphoria/patterns
# Returns: {"patterns": [], "total_matching": 0}
Solution Options
Option 1: Fix HostedClient (Proper Fix)
Update HostedClient to post to community endpoint when community.enabled = true:
// In hosted.rs push_observations()
let endpoint = if config.community.is_enabled() {
"/v1/aphoria/community/observations"
} else {
"/v1/aphoria/observations"
};
let url = format!("{}{}", self.base_url, endpoint);
Option 2: Add Aggregation Job (Alternative)
Create a background job that reads from /v1/aphoria/observations and aggregates into patterns:
// Periodically aggregate stored observations into patterns
fn aggregate_observations_to_patterns() {
// Read observations from assertions store
// Group by (subject, predicate, value)
// Update pattern_aggregate_store
}
Option 3: Manual Aggregation (Workaround)
Directly call the community observations endpoint with anonymized data:
curl -X POST http://localhost:18180/v1/aphoria/community/observations \
-H "Content-Type: application/json" \
-d '{
"observations": [...],
"project_hash": "...",
"client_version": "0.1.0"
}'
Recommendation
Option 1 is the cleanest. The flow should be:
[User enables community.enabled = true in config]
↓
[Runs: aphoria scan --persist --sync]
↓
[HostedClient checks config.community.enabled]
↓
[If true: POST to /v1/aphoria/community/observations]
[If false: POST to /v1/aphoria/observations]
↓
[Community endpoint aggregates patterns]
↓
[Dashboard queries GET /v1/aphoria/patterns]
↓
[Shows community consensus]
Documentation We Created
- ✅ DOCUMENTATION_INDEX.md - Complete guide index
- ✅ Community sharing enabled in Maxwell config
- ✅ Hosted mode configured to localhost
- ✅ Scan successfully pushed 66 observations
Testing After Fix
Once Option 1 is implemented:
# 1. Run scan with community enabled
cargo run --bin aphoria -- scan \
--config /home/jml/Workspace/maxwell/.aphoria/config.toml \
--persist --sync \
/home/jml/Workspace/maxwell
# 2. Verify corpus populated
curl http://localhost:18180/v1/aphoria/patterns | jq '.patterns | length'
# Should show: 66 (or aggregated count)
# 3. View in dashboard
open http://aphoria.local/corpus
# Should show Maxwell patterns with project_count=1
Files Changed
-
Maxwell config:
/home/jml/Workspace/maxwell/.aphoria/config.toml- Added
[hosted]section - Enabled
community.enabled = true
- Added
-
Documentation: Created comprehensive docs
DOCUMENTATION_INDEX.mdCORPUS_IMPROVEMENTS.mdPROJECT_PATH_IMPLEMENTATION.md
Next Steps
- Fix HostedClient to use community endpoint when community sharing enabled
- Run scan again - observations will go to correct endpoint
- Verify corpus populated in dashboard
- Scan StemeDB itself to add more patterns to corpus
Status: Observations successfully pushed ✅ Issue: Wrong endpoint (architecture gap) ❌ Fix: Update HostedClient routing logic ETA: ~30 minutes to implement Option 1
Last Updated: 2026-02-08