tidaldb/docs/planning/milestone-8/phase-5/task-01-tenant-identity.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

170 lines
5.7 KiB
Markdown

# Task 01: TenantId + TenantConfig
## Delivers
`TenantId(u64)` and `TenantConfig` in `tidal/src/replication/tenant.rs`. Per-tenant quotas (signals/sec token bucket, max entities, max storage bytes). WAL segment directories namespaced under `{data_dir}/tenants/{tenant_id}/wal/`. `TidalError::QuotaExceeded` returned when limits are breached.
## Complexity: M
## Dependencies
- Phase 8.1 (ShardId, RegionId)
- Phase 8.2 (WAL segment naming)
## Technical Design
```rust
// tidal/src/replication/tenant.rs
/// Tenant identity type.
///
/// A tenant is an agent workspace or an isolated application namespace.
/// All data (WAL segments, signal ledger state, entity metadata) is
/// scoped to a tenant's filesystem directory.
///
/// `TenantId(0)` is the default single-tenant ID used by non-multi-tenant
/// deployments. This ensures backward compatibility with all existing code.
#[derive(
Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash,
Default,
serde::Serialize, serde::Deserialize,
)]
pub struct TenantId(pub u64);
impl TenantId {
/// The default tenant ID for single-tenant deployments.
pub const DEFAULT: Self = Self(0);
}
impl std::fmt::Display for TenantId {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "t{}", self.0)
}
}
/// Per-tenant resource configuration.
///
/// Enforced at write time. Violations return `TidalError::QuotaExceeded`.
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct TenantConfig {
pub tenant_id: TenantId,
/// Maximum signals per second (token bucket rate limit).
///
/// `None` means unlimited (trusted internal tenant).
pub max_signals_per_sec: Option<u32>,
/// Maximum number of distinct entities (items + users + creators).
///
/// Checked on entity create; `None` means unlimited.
pub max_entities: Option<u64>,
/// Maximum total storage in bytes for this tenant's data directory.
///
/// Checked on WAL segment seal; `None` means unlimited.
pub max_storage_bytes: Option<u64>,
/// Residency policy: which regions this tenant's data must reside in.
///
/// Empty = no restriction. Used by `TenantRouter` to constrain placement.
pub required_regions: Vec<RegionId>,
/// Human-readable label for this tenant (for monitoring/logging).
pub label: String,
}
impl TenantConfig {
/// Default config: unlimited quotas, no residency constraint.
pub fn default_tenant() -> Self {
Self {
tenant_id: TenantId::DEFAULT,
max_signals_per_sec: None,
max_entities: None,
max_storage_bytes: None,
required_regions: Vec::new(),
label: "default".to_string(),
}
}
}
/// Token bucket rate limiter for per-tenant signal ingestion.
///
/// Refills at `max_signals_per_sec` tokens per second.
/// Costs 1 token per signal write. Bucket max = 2x rate (burst headroom).
#[derive(Debug)]
pub struct TenantRateLimiter {
/// Current tokens (f64 for sub-token precision).
tokens: AtomicF64,
/// Refill rate (tokens/ns).
refill_rate_per_ns: f64,
/// Maximum bucket size (tokens).
max_tokens: f64,
/// Last refill timestamp (ns).
last_refill_ns: AtomicU64,
}
impl TenantRateLimiter {
pub fn new(max_signals_per_sec: u32) -> Self {
let rate_per_ns = max_signals_per_sec as f64 / 1_000_000_000.0;
let max_tokens = (max_signals_per_sec as f64) * 2.0; // 2s burst
Self {
tokens: AtomicF64::new(max_tokens),
refill_rate_per_ns: rate_per_ns,
max_tokens,
last_refill_ns: AtomicU64::new(crate::util::now_ns()),
}
}
/// Try to consume 1 token. Returns `Ok(())` if allowed, `Err(QuotaExceeded)` if throttled.
pub fn try_acquire(&self) -> Result<()> {
let now = crate::util::now_ns();
let last = self.last_refill_ns.load(Ordering::Relaxed);
let elapsed_ns = now.saturating_sub(last);
let refill = elapsed_ns as f64 * self.refill_rate_per_ns;
let new_tokens = (self.tokens.load(Ordering::Relaxed) + refill)
.min(self.max_tokens);
if new_tokens < 1.0 {
return Err(TidalError::QuotaExceeded("signal rate limit exceeded".into()));
}
self.tokens.store(new_tokens - 1.0, Ordering::Relaxed);
self.last_refill_ns.store(now, Ordering::Relaxed);
Ok(())
}
}
```
### WAL Directory Namespacing
```rust
// tidal/src/wal/segment.rs (additions)
/// Build the tenant-scoped WAL directory path.
///
/// For `TenantId::DEFAULT` (backward compat): returns `{data_dir}/wal/` unchanged.
/// For other tenants: returns `{data_dir}/tenants/{tenant_id}/wal/`.
pub fn tenant_wal_dir(data_dir: &Path, tenant_id: TenantId) -> PathBuf {
if tenant_id == TenantId::DEFAULT {
data_dir.join("wal")
} else {
data_dir
.join("tenants")
.join(tenant_id.0.to_string())
.join("wal")
}
}
```
## Acceptance Criteria
- [ ] `TenantId` is `Copy + Clone + Debug + Eq + Hash + Ord + Default + Serialize + Deserialize`
- [ ] `TenantId::DEFAULT` is `TenantId(0)`; all existing code using `TenantId(0)` works unchanged
- [ ] `TenantRateLimiter::try_acquire()` returns `TidalError::QuotaExceeded` within 1ms when token bucket is empty
- [ ] Token bucket refills at the configured rate: after sleeping `1/rate` seconds, one token is available
- [ ] WAL directory for `TenantId::DEFAULT` is `{data_dir}/wal/` (unchanged from m1p5)
- [ ] WAL directory for `TenantId(42)` is `{data_dir}/tenants/42/wal/`
- [ ] Unit test: configure 100 signals/sec, write 200 signals in a tight loop, verify ~100 succeed and ~100 receive `QuotaExceeded`
- [ ] `cargo clippy -D warnings` and `cargo fmt` pass