Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5.8 KiB
m8p5: Control Plane, Multi-Tenancy, and Routing
Delivers
Tenant isolation, routing configuration, and operational tooling for a hosted multi-tenant deployment. Each tenant (agent workspace) gets its own WAL namespace and resource quotas. The control plane manages shard-to-region assignment, tenant placement, and rolling upgrades. A tenant can be migrated to a new region by changing routing configuration only.
Deliverables:
TenantId(u64): tenant identity type; WAL segments namespaced by tenantTenantConfig: per-tenant quota (max signals/sec, max entities, max storage bytes), residency policy (required regions)TenantRouter: extendsShardRouterwith tenant-aware routing; tenant -> shard mappingControlPlane: manages cluster topology (shard assignments, tenant placement, region health)TenantMigration: moves a tenant to a new shard/region by shipping WAL segments + state snapshot; zero-downtime via dual-write windowRollingUpgradeCoordinator: upgrades nodes one at a time with drain + upgrade + rejoin; uses WAL shipping to keep followers current during the window
Dependencies
- Requires: Phase 8.2 (WAL shipping), Phase 8.3 (reconciliation), Phase 8.4 (session continuity)
- Files modified:
tidal/src/db/config.rs-- add tenant configuration fieldstidal/src/replication/shard.rs-- extendShardRouterwith tenant routingtidal/src/wal/segment.rs-- tenant-namespaced segment directoriestidal/src/db/open.rs-- tenant-scoped initialization
- Files created:
tidal/src/replication/tenant.rs--TenantId,TenantConfig,TenantRoutertidal/src/replication/control.rs--ControlPlane, topology managementtidal/src/replication/migration.rs--TenantMigrationtidal/src/replication/upgrade.rs--RollingUpgradeCoordinator
Research References
thoughts.md-- Part I/Citadel (per-tenant filesystem isolation: "every tenant is an island")
Acceptance Criteria (Phase Level)
TenantId(u64)isCopy + Clone + Debug + Eq + Hash + Ord; WAL segment directories are namespaced as{data_dir}/tenants/{tenant_id}/wal/TenantConfigenforces rate limits: signals/sec (token bucket), max entities (hard cap), max storage bytes (checked on write); violations returnTidalError::QuotaExceededTenantRoutermaps(TenantId, EntityId) -> (RegionId, ShardId); default is hash-based; residency policy constrains which regions a tenant's data can reside inControlPlaneexposes cluster health: per-shard entity count, signal throughput, replication lag, disk usage; serializable to JSON for monitoring integration- Tenant migration test: move tenant from shard A to shard B; during migration, dual-write ensures no signal loss; after migration, shard A's tenant data is garbage-collected; total downtime = 0 (reads served from both shards during migration window)
- Rolling upgrade: upgrade 1 of 3 nodes; WAL shipping continues to remaining 2; upgraded node rejoins and catches up from WAL; total query availability = 100% during the upgrade window
- Per-tenant WAL isolation: a misbehaving tenant (burst of 100K signals/sec) is throttled without affecting other tenants on the same shard; rate limiter returns
TidalError::QuotaExceededwithin 1ms
Task Execution Order
Task 01: TenantId + TenantConfig ──────────┐
├──> Task 03: ControlPlane
Task 02: TenantRouter ────────────────────┤
├──> Task 04: TenantMigration
│
└──> Task 05: RollingUpgrade
│
v
Task 06: Multi-Tenancy Integration Tests
Tasks 01 and 02 are parallelizable. Tasks 03, 04, 05 depend on both. Task 06 depends on all.
Module Location
| File | Status | Contains |
|---|---|---|
tidal/src/replication/tenant.rs |
NEW | TenantId, TenantConfig, TenantRouter, quota enforcement |
tidal/src/replication/control.rs |
NEW | ControlPlane, cluster topology, health metrics |
tidal/src/replication/migration.rs |
NEW | TenantMigration, dual-write protocol |
tidal/src/replication/upgrade.rs |
NEW | RollingUpgradeCoordinator |
tidal/src/db/config.rs |
MODIFIED | Tenant config fields |
tidal/src/replication/shard.rs |
MODIFIED | Tenant-aware routing |
tidal/src/wal/segment.rs |
MODIFIED | Tenant-namespaced directories |
tidal/src/db/open.rs |
MODIFIED | Tenant-scoped initialization |
Notes
Tenant isolation follows Citadel's model
Per-tenant filesystem directories, per-tenant WAL files, per-tenant rate limiters. The OS enforces the boundary. A misbehaving tenant cannot affect others because its writes go to separate files and its rate limiter is checked before the WAL write.
Migration via dual-write
During migration, writes for the migrating tenant go to both the old shard and the new shard. After the new shard has caught up (verified by seqno matching), reads are switched to the new shard, and the old shard's tenant data is garbage-collected. This is the CockroachDB range-split model adapted for tenant migration.
Control plane is embedded, not external
The ControlPlane runs within the leader node's process (or a designated coordinator node). It is not a separate service. This matches tidalDB's embeddable philosophy.
Done When
A developer can configure 3 tenants on a 3-shard cluster, apply per-tenant rate limits, migrate a tenant from one shard to another with zero downtime, perform a rolling upgrade of all nodes, and observe that per-tenant isolation prevents noisy-neighbor effects throughout.