13 KiB
Task 02: Degradation + Rate Limiting + Session Cleanup UAT Tests
Delivers
Three integration tests in tidal/tests/m7_uat.rs proving production load-handling behavior:
uat_degradation_progression-- Simulate concurrent load above threshold; verifydegradation_levelin response changes as load increases; verify all queries still return results even under degradation.uat_rate_limiting_isolation-- Configure a low rate limit (10 signals/sec); exhaust the bucket; verify excess writes returnTidalError::RateLimited; verify a second session is unaffected.uat_session_auto_cleanup-- Create a session with a short TTL policy (30s); wait for the sweeper to fire; verify the session is auto-closed withauto_closed: true.
Complexity: L
Dependencies
- m7p2 complete (
DegradationLevelenum, load detector,TidalError::RateLimited,TidalError::Backpressure, session TTL auto-cleanup sweeper,SessionSummary.auto_closed) - m7p1 complete (stable crash recovery before load testing)
Technical Design
Test 4: Degradation progression under load
#[test]
fn uat_degradation_progression() {
let db = TidalDb::builder()
.ephemeral()
.with_schema(m7_uat_schema())
// Configure low degradation thresholds for testing.
// ReducedCandidates at 5 in-flight, CoarseAggregates at 10, NoDiversity at 15.
.with_degradation_thresholds(5, 10, 15)
.open()
.unwrap();
let now = Timestamp::now();
// Write 200 items with signals so queries have work to do.
for id in 1..=200u64 {
let mut meta = HashMap::new();
meta.insert("title".to_string(), format!("Degrade Item {id}"));
meta.insert("category".to_string(), "test".to_string());
db.write_item_with_metadata(EntityId::new(id), &meta).unwrap();
}
for id in 1..=100u64 {
db.signal("view", EntityId::new(id), 1.0, now).unwrap();
}
// At zero load, degradation should be Full (no degradation).
let results = db
.retrieve(
&Retrieve::builder()
.profile("trending")
.limit(10)
.build()
.unwrap(),
)
.unwrap();
assert!(!results.items.is_empty(), "baseline query should return results");
assert_eq!(
results.stats.degradation_level,
DegradationLevel::Full,
"at zero load, degradation level should be Full"
);
// Simulate load by artificially inflating the in-flight counter.
// The load detector exposes a test-only method to set the in-flight count.
db.load_detector().set_in_flight_for_test(6);
let results_reduced = db
.retrieve(
&Retrieve::builder()
.profile("trending")
.limit(10)
.build()
.unwrap(),
)
.unwrap();
assert!(!results_reduced.items.is_empty(), "ReducedCandidates query should still return results");
assert_eq!(
results_reduced.stats.degradation_level,
DegradationLevel::ReducedCandidates,
"at 6 in-flight (threshold 5), should be ReducedCandidates"
);
// Push to CoarseAggregates.
db.load_detector().set_in_flight_for_test(11);
let results_coarse = db
.retrieve(
&Retrieve::builder()
.profile("trending")
.limit(10)
.build()
.unwrap(),
)
.unwrap();
assert!(!results_coarse.items.is_empty(), "CoarseAggregates query should still return results");
assert_eq!(
results_coarse.stats.degradation_level,
DegradationLevel::CoarseAggregates,
"at 11 in-flight (threshold 10), should be CoarseAggregates"
);
// Push to NoDiversity.
db.load_detector().set_in_flight_for_test(16);
let results_no_div = db
.retrieve(
&Retrieve::builder()
.profile("trending")
.limit(10)
.build()
.unwrap(),
)
.unwrap();
assert!(!results_no_div.items.is_empty(), "NoDiversity query should still return results");
assert_eq!(
results_no_div.stats.degradation_level,
DegradationLevel::NoDiversity,
"at 16 in-flight (threshold 15), should be NoDiversity"
);
// Reset and verify recovery.
db.load_detector().set_in_flight_for_test(0);
let results_recovered = db
.retrieve(
&Retrieve::builder()
.profile("trending")
.limit(10)
.build()
.unwrap(),
)
.unwrap();
assert_eq!(
results_recovered.stats.degradation_level,
DegradationLevel::Full,
"after load drops, degradation should return to Full"
);
}
Test 5: Per-agent rate limiting isolation
#[test]
fn uat_rate_limiting_isolation() {
// Build schema with a rate-limited session policy: 10 signals/sec.
let mut builder = SchemaBuilder::new();
let _ = builder
.signal(
"view",
EntityKind::Item,
DecaySpec::Exponential {
half_life: Duration::from_secs(7 * 24 * 3600),
},
)
.windows(&[Window::AllTime])
.velocity(false)
.add();
builder.session_policy(
"rate_limited",
tidaldb::schema::AgentPolicy::builder()
.max_session_duration(Duration::from_secs(300))
.max_signals_per_second(10)
.build(),
);
builder.session_policy(
"unlimited",
tidaldb::schema::AgentPolicy::builder()
.max_session_duration(Duration::from_secs(300))
.build(),
);
let schema = builder.build().unwrap();
let db = TidalDb::builder()
.ephemeral()
.with_schema(schema)
.open()
.unwrap();
// Write some items.
for id in 1..=10u64 {
let mut meta = HashMap::new();
meta.insert("title".to_string(), format!("RL Item {id}"));
db.write_item_with_metadata(EntityId::new(id), &meta).unwrap();
}
// Start two sessions: one rate-limited, one unlimited.
let session_a = db
.start_session(1, "agent_a", "rate_limited", HashMap::new())
.unwrap();
let session_b = db
.start_session(2, "agent_b", "unlimited", HashMap::new())
.unwrap();
let now = Timestamp::now();
// Session A: write 50 signals rapidly. First ~10 should succeed;
// remaining should return RateLimited.
let mut a_success = 0u64;
let mut a_limited = 0u64;
for _ in 0..50 {
match db.session_signal(&session_a, "view", EntityId::new(1), 1.0, now, None) {
Ok(()) => a_success += 1,
Err(e) => {
// Verify the error is RateLimited, not some other error.
let err_str = format!("{e}");
assert!(
err_str.contains("rate") || err_str.contains("Rate") || err_str.contains("limited"),
"expected RateLimited error, got: {err_str}"
);
a_limited += 1;
}
}
}
assert!(
a_success > 0,
"at least some signals should succeed before rate limit kicks in"
);
assert!(
a_limited > 0,
"some signals should be rate-limited (wrote 50, limit is 10/sec)"
);
assert!(
a_success <= 15,
"no more than ~15 signals should succeed at 10/sec (allowing some token accumulation), got {a_success}"
);
// Session B (unlimited): should be completely unaffected.
let mut b_success = 0u64;
for _ in 0..50 {
match db.session_signal(&session_b, "view", EntityId::new(2), 1.0, now, None) {
Ok(()) => b_success += 1,
Err(e) => panic!("unlimited session should not be rate-limited: {e}"),
}
}
assert_eq!(
b_success, 50,
"unlimited session should accept all 50 signals"
);
db.close_session(session_a).unwrap();
db.close_session(session_b).unwrap();
}
Test 6: Session auto-cleanup after TTL
#[test]
fn uat_session_auto_cleanup() {
// Schema with a very short session TTL (30 seconds).
let mut builder = SchemaBuilder::new();
let _ = builder
.signal(
"view",
EntityKind::Item,
DecaySpec::Exponential {
half_life: Duration::from_secs(7 * 24 * 3600),
},
)
.windows(&[Window::AllTime])
.velocity(false)
.add();
builder.session_policy(
"short_ttl",
tidaldb::schema::AgentPolicy::builder()
.max_session_duration(Duration::from_secs(30))
.build(),
);
let schema = builder.build().unwrap();
let db = TidalDb::builder()
.ephemeral()
.with_schema(schema)
// Configure sweeper interval to 5 seconds for faster test execution.
.with_session_sweep_interval(Duration::from_secs(5))
.open()
.unwrap();
// Write an item so session signals have a target.
let mut meta = HashMap::new();
meta.insert("title".to_string(), "Cleanup Item".to_string());
db.write_item_with_metadata(EntityId::new(1), &meta).unwrap();
// Start a session and write some signals.
let handle = db
.start_session(1, "agent_cleanup", "short_ttl", HashMap::new())
.unwrap();
let session_id = handle.id;
let now = Timestamp::now();
db.session_signal(&handle, "view", EntityId::new(1), 1.0, now, None)
.unwrap();
// The session is active.
let active = db.active_sessions();
assert!(
active.iter().any(|s| s.id == session_id),
"session should be active immediately after creation"
);
// Drop the handle without closing -- simulates an agent that forgot to close.
// The handle's closed flag is NOT set, so the sweeper will find it expired.
std::mem::forget(handle);
// Wait for TTL + sweeper interval + margin.
// TTL = 30s, sweeper runs every 5s, so wait 35s to be safe.
std::thread::sleep(Duration::from_secs(35));
// Session should have been auto-closed by the sweeper.
let active_after = db.active_sessions();
assert!(
!active_after.iter().any(|s| s.id == session_id),
"session should no longer be active after TTL + sweeper interval"
);
// Verify the session was archived with auto_closed = true.
let snapshot = db.session_snapshot(session_id).unwrap();
assert!(
snapshot.auto_closed,
"auto-closed session should have auto_closed = true in its snapshot"
);
// Verify the summary reflects the signals that were written.
assert!(
snapshot.signals_written >= 1,
"auto-closed session should preserve signal count"
);
}
Imports needed (additions to the shared header)
use tidaldb::query::retrieve::DegradationLevel; // From m7p2
Acceptance Criteria
uat_degradation_progressionpasses: degradation level progresses Full -> ReducedCandidates -> CoarseAggregates -> NoDiversity as simulated load increases; all queries return results at every level; level recovers to Full when load dropsuat_rate_limiting_isolationpasses: rate-limited session (10/sec) rejects excess writes withRateLimitederror; unlimited session on a different agent accepts all 50 writes; error message contains "rate" or "limited"uat_session_auto_cleanuppasses: session with 30s TTL is auto-closed by sweeper within 35s;auto_closed: truein snapshot; signal count preserveduat_session_auto_cleanupusesstd::mem::forget(handle)to simulate an agent that abandons a session without closing- Each test completes within its timeout: degradation < 5s, rate limiting < 5s, cleanup < 45s
cargo clippy --manifest-path tidal/Cargo.toml -- -D warningspasses
Test Strategy
Degradation test uses a test-only setter (set_in_flight_for_test) on the load detector to deterministically control the in-flight count without spinning up actual concurrent threads. This avoids flaky timing issues and validates the threshold logic directly.
Rate limiting test writes 50 signals synchronously in a tight loop. At 10 tokens/sec with no refill during the burst, roughly 10 should succeed and 40 should fail. The assertion allows up to 15 successes to account for initial token accumulation and refill during the loop. The critical assertion is that the second session (unlimited) accepts all 50 writes -- proving per-session isolation.
Session cleanup test is the only test with a wall-clock wait (35s). It configures the sweeper to run every 5 seconds and sets a 30-second TTL. The std::mem::forget(handle) pattern simulates an agent that crashes without calling close_session. After 35 seconds, the sweeper should have detected the expired session and auto-closed it. The test verifies the auto_closed flag and that the signal count is preserved in the archived snapshot.