# m8p5: Control Plane, Multi-Tenancy, and Routing ## Delivers Tenant isolation, routing configuration, and operational tooling for a hosted multi-tenant deployment. Each tenant (agent workspace) gets its own WAL namespace and resource quotas. The control plane manages shard-to-region assignment, tenant placement, and rolling upgrades. A tenant can be migrated to a new region by changing routing configuration only. Deliverables: - `TenantId(u64)`: tenant identity type; WAL segments namespaced by tenant - `TenantConfig`: per-tenant quota (max signals/sec, max entities, max storage bytes), residency policy (required regions) - `TenantRouter`: extends `ShardRouter` with tenant-aware routing; tenant -> shard mapping - `ControlPlane`: manages cluster topology (shard assignments, tenant placement, region health) - `TenantMigration`: moves a tenant to a new shard/region by shipping WAL segments + state snapshot; zero-downtime via dual-write window - `RollingUpgradeCoordinator`: upgrades nodes one at a time with drain + upgrade + rejoin; uses WAL shipping to keep followers current during the window ## Dependencies - **Requires:** Phase 8.2 (WAL shipping), Phase 8.3 (reconciliation), Phase 8.4 (session continuity) - **Files modified:** - `tidal/src/db/config.rs` -- add tenant configuration fields - `tidal/src/replication/shard.rs` -- extend `ShardRouter` with tenant routing - `tidal/src/wal/segment.rs` -- tenant-namespaced segment directories - `tidal/src/db/open.rs` -- tenant-scoped initialization - **Files created:** - `tidal/src/replication/tenant.rs` -- `TenantId`, `TenantConfig`, `TenantRouter` - `tidal/src/replication/control.rs` -- `ControlPlane`, topology management - `tidal/src/replication/migration.rs` -- `TenantMigration` - `tidal/src/replication/upgrade.rs` -- `RollingUpgradeCoordinator` ## Research References - `thoughts.md` -- Part I/Citadel (per-tenant filesystem isolation: "every tenant is an island") ## Acceptance Criteria (Phase Level) - [ ] `TenantId(u64)` is `Copy + Clone + Debug + Eq + Hash + Ord`; WAL segment directories are namespaced as `{data_dir}/tenants/{tenant_id}/wal/` - [ ] `TenantConfig` enforces rate limits: signals/sec (token bucket), max entities (hard cap), max storage bytes (checked on write); violations return `TidalError::QuotaExceeded` - [ ] `TenantRouter` maps `(TenantId, EntityId) -> (RegionId, ShardId)`; default is hash-based; residency policy constrains which regions a tenant's data can reside in - [ ] `ControlPlane` exposes cluster health: per-shard entity count, signal throughput, replication lag, disk usage; serializable to JSON for monitoring integration - [ ] Tenant migration test: move tenant from shard A to shard B; during migration, dual-write ensures no signal loss; after migration, shard A's tenant data is garbage-collected; total downtime = 0 (reads served from both shards during migration window) - [ ] Rolling upgrade: upgrade 1 of 3 nodes; WAL shipping continues to remaining 2; upgraded node rejoins and catches up from WAL; total query availability = 100% during the upgrade window - [ ] Per-tenant WAL isolation: a misbehaving tenant (burst of 100K signals/sec) is throttled without affecting other tenants on the same shard; rate limiter returns `TidalError::QuotaExceeded` within 1ms ## Task Execution Order ``` Task 01: TenantId + TenantConfig ──────────┐ ├──> Task 03: ControlPlane Task 02: TenantRouter ────────────────────┤ ├──> Task 04: TenantMigration │ └──> Task 05: RollingUpgrade │ v Task 06: Multi-Tenancy Integration Tests ``` Tasks 01 and 02 are parallelizable. Tasks 03, 04, 05 depend on both. Task 06 depends on all. ## Module Location | File | Status | Contains | |------|--------|----------| | `tidal/src/replication/tenant.rs` | NEW | `TenantId`, `TenantConfig`, `TenantRouter`, quota enforcement | | `tidal/src/replication/control.rs` | NEW | `ControlPlane`, cluster topology, health metrics | | `tidal/src/replication/migration.rs` | NEW | `TenantMigration`, dual-write protocol | | `tidal/src/replication/upgrade.rs` | NEW | `RollingUpgradeCoordinator` | | `tidal/src/db/config.rs` | MODIFIED | Tenant config fields | | `tidal/src/replication/shard.rs` | MODIFIED | Tenant-aware routing | | `tidal/src/wal/segment.rs` | MODIFIED | Tenant-namespaced directories | | `tidal/src/db/open.rs` | MODIFIED | Tenant-scoped initialization | ## Notes ### Tenant isolation follows Citadel's model Per-tenant filesystem directories, per-tenant WAL files, per-tenant rate limiters. The OS enforces the boundary. A misbehaving tenant cannot affect others because its writes go to separate files and its rate limiter is checked before the WAL write. ### Migration via dual-write During migration, writes for the migrating tenant go to both the old shard and the new shard. After the new shard has caught up (verified by seqno matching), reads are switched to the new shard, and the old shard's tenant data is garbage-collected. This is the CockroachDB range-split model adapted for tenant migration. ### Control plane is embedded, not external The `ControlPlane` runs within the leader node's process (or a designated coordinator node). It is not a separate service. This matches tidalDB's embeddable philosophy. ## Done When A developer can configure 3 tenants on a 3-shard cluster, apply per-tenant rate limits, migrate a tenant from one shard to another with zero downtime, perform a rolling upgrade of all nodes, and observe that per-tenant isolation prevents noisy-neighbor effects throughout.