stemedb/applications/aphoria-dashboard/CORPUS_STATUS.md
jml cce54358d2 feat(aphoria): add git commit tracking + comprehensive documentation
**Git Commit Tracking**
- Automatically capture git commit hash when claims/observations are ingested
- Store in assertion metadata for temporal context and audit trails
- Graceful degradation in non-git environments
- Solves double-commit problem by capturing hash at ingestion time

**Implementation**
- walker/git.rs: get_current_commit_hash() utility function
- bridge.rs: Accept optional git_commit parameter in all conversion functions
- episteme/local: Store project_root, capture git hash during ingestion
- 5 new tests for git hash tracking + metadata validation
- All 1162 aphoria tests passing

**Documentation Overhaul**
- README: Added Observations vs Claims distinction, git tracking, dashboard
- CLI Reference: New sections for git integration and ignore/exclusion system
- Comprehensive ignore documentation: .aphoriaignore, inline comments, 4 methods
- Enhanced verification engine docs with matching capabilities
- DOCUMENTATION_UPDATES.md: Complete audit summary

**Dashboard Separation**
- Moved Aphoria-specific UI from stemedb-dashboard to aphoria-dashboard
- Clean separation of concerns: StemeDB for core, Aphoria for security
- Added dashboard documentation and setup guides

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 18:36:46 +00:00

5.3 KiB

Corpus Feature Status

Summary

Corpus API works - /v1/aphoria/patterns endpoint is functional Community sharing configured - Maxwell has community.enabled = true Hosted mode configured - Points to http://localhost:18180 Observations pushed - 66 observations successfully sent to API Corpus still empty - Observations went to wrong endpoint

Root Cause

The HostedClient posts observations to:

POST /v1/aphoria/observations

But the corpus aggregator reads from:

POST /v1/aphoria/community/observations
GET  /v1/aphoria/patterns

Two different endpoints:

  1. /v1/aphoria/observations - Stores observations as assertions (for hosted team mode)
  2. /v1/aphoria/community/observations - Aggregates into pattern corpus (for community patterns)

What Happened

When we ran:

aphoria scan --persist --sync --config /home/jml/Workspace/maxwell/.aphoria/config.toml /home/jml/Workspace/maxwell

Logs showed:

[INFO] Using persistent mode (with Episteme storage) sync=true hosted=true
[INFO] Pushed observations to hosted server accepted=66 deduplicated=0

Observations were successfully pushed to /v1/aphoria/observations But corpus reads from /v1/aphoria/community/observations

Architecture Gap

There are two separate features that got conflated:

1. Hosted Mode (Team Server)

  • Purpose: Teams run private StemeDB server
  • Endpoint: /v1/aphoria/observations
  • What it does: Stores observations for team projects
  • Use case: "Acme Corp runs StemeDB for all their projects"

2. Community Corpus (Public Patterns)

  • Purpose: Anonymized pattern aggregation across projects
  • Endpoint: /v1/aphoria/community/observations
  • What it does: Aggregates by (subject, predicate, value), counts projects
  • Use case: "Show me what 100+ projects do for TLS settings"

Current State

Observations Storage:

curl http://localhost:18180/v1/aphoria/observations
# Returns: 66 observations stored as assertions

Corpus (Pattern Aggregates):

curl http://localhost:18180/v1/aphoria/patterns
# Returns: {"patterns": [], "total_matching": 0}

Solution Options

Option 1: Fix HostedClient (Proper Fix)

Update HostedClient to post to community endpoint when community.enabled = true:

// In hosted.rs push_observations()
let endpoint = if config.community.is_enabled() {
    "/v1/aphoria/community/observations"
} else {
    "/v1/aphoria/observations"
};
let url = format!("{}{}", self.base_url, endpoint);

Option 2: Add Aggregation Job (Alternative)

Create a background job that reads from /v1/aphoria/observations and aggregates into patterns:

// Periodically aggregate stored observations into patterns
fn aggregate_observations_to_patterns() {
    // Read observations from assertions store
    // Group by (subject, predicate, value)
    // Update pattern_aggregate_store
}

Option 3: Manual Aggregation (Workaround)

Directly call the community observations endpoint with anonymized data:

curl -X POST http://localhost:18180/v1/aphoria/community/observations \
  -H "Content-Type: application/json" \
  -d '{
    "observations": [...],
    "project_hash": "...",
    "client_version": "0.1.0"
  }'

Recommendation

Option 1 is the cleanest. The flow should be:

[User enables community.enabled = true in config]
         ↓
[Runs: aphoria scan --persist --sync]
         ↓
[HostedClient checks config.community.enabled]
         ↓
[If true: POST to /v1/aphoria/community/observations]
[If false: POST to /v1/aphoria/observations]
         ↓
[Community endpoint aggregates patterns]
         ↓
[Dashboard queries GET /v1/aphoria/patterns]
         ↓
[Shows community consensus]

Documentation We Created

  • DOCUMENTATION_INDEX.md - Complete guide index
  • Community sharing enabled in Maxwell config
  • Hosted mode configured to localhost
  • Scan successfully pushed 66 observations

Testing After Fix

Once Option 1 is implemented:

# 1. Run scan with community enabled
cargo run --bin aphoria -- scan \
  --config /home/jml/Workspace/maxwell/.aphoria/config.toml \
  --persist --sync \
  /home/jml/Workspace/maxwell

# 2. Verify corpus populated
curl http://localhost:18180/v1/aphoria/patterns | jq '.patterns | length'
# Should show: 66 (or aggregated count)

# 3. View in dashboard
open http://aphoria.local/corpus
# Should show Maxwell patterns with project_count=1

Files Changed

  1. Maxwell config: /home/jml/Workspace/maxwell/.aphoria/config.toml

    • Added [hosted] section
    • Enabled community.enabled = true
  2. Documentation: Created comprehensive docs

    • DOCUMENTATION_INDEX.md
    • CORPUS_IMPROVEMENTS.md
    • PROJECT_PATH_IMPLEMENTATION.md

Next Steps

  1. Fix HostedClient to use community endpoint when community sharing enabled
  2. Run scan again - observations will go to correct endpoint
  3. Verify corpus populated in dashboard
  4. Scan StemeDB itself to add more patterns to corpus

Status: Observations successfully pushed Issue: Wrong endpoint (architecture gap) Fix: Update HostedClient routing logic ETA: ~30 minutes to implement Option 1

Last Updated: 2026-02-08