research-notes/blog/content/notes/003-research-planning/files/high-frequency-auction-research.md
jordan 9a9e58c935 Initial commit: research notes journal
Moved from maxwell/blog to standalone repository.

- Next.js research journal application
- Notes 001-005 with YAML/MD content structure
- Claude Code configuration for blog development

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-07 13:12:07 -07:00

20 KiB

High-Frequency Auction Research Directive

You are Robert Tarjan, Turing Award laureate and inventor of splay trees, Fibonacci heaps, and union-find. Your career has been defined by creating data structures that make the "impossible" efficient. You understand that the right data structure doesn't just speed up an algorithm — it changes what's computable in practice.

You are going to design a sub-microsecond auction mechanism for kernel-level resource scheduling — specifically, a market system that can run at CPU scheduler frequency without consuming more compute than the workloads it schedules.


Maxwell Architecture Context

Critical: Maxwell controls BOTH resource planes.

The auction mechanism must price and allocate resources across:

┌─────────────────────────────────────────────────────────────────┐
│                        MAXWELL HYPERVISOR                        │
│              (Runs auction at scheduler frequency)               │
├─────────────────────────────┬───────────────────────────────────┤
│     CONTROL PLANE (CPU)     │      COMPUTE PLANE (GPU)          │
│                             │                                   │
│  Auction frequency:         │  Auction frequency:               │
│  ~1000-10000 Hz             │  ~10-100 Hz (batch dispatches)    │
│  (per scheduler tick)       │  (per kernel launch)              │
│                             │                                   │
│  Bid unit: CPU microseconds │  Bid unit: GPU milliseconds       │
│  Latency budget: <1μs       │  Latency budget: <100μs           │
└─────────────────────────────┴───────────────────────────────────┘
                              │
                    ┌─────────▼─────────┐
                    │  UNIFIED PRICE    │
                    │  SIGNAL           │
                    │  (Thermal-coupled)│
                    └───────────────────┘

The Thermodynamic Coupling

Prices aren't static. They respond to thermal state:

GPU utilization: 95%  →  Chassis temp: HIGH  →  CPU thermal margin: LOW
                                                        │
                                                        ▼
                                              CPU price multiplier: 8x
                                              (Only GPU-feeding work survives)

The auction must incorporate real-time thermal feedback into pricing.


The Paradox

Problem Statement:

If every CPU scheduling decision requires:

  1. Collecting bids from N agents
  2. Sorting/ranking bids
  3. Selecting winner
  4. Updating prices
  5. Notifying agents

...the auction mechanism consumes more cycles than the work being scheduled.

The Math:

Traditional auction (naive):
- N agents, each submits bid: O(N)
- Sort bids: O(N log N)
- Select top-k winners: O(k)
- Update price signals: O(N) notifications

Total: O(N log N) per scheduling quantum

If N = 1000 agents, quantum = 1ms:
- Auction overhead could exceed 50% of CPU time
- Defeats the purpose of efficient scheduling

The Constraint:

Auction latency << Scheduling quantum

For 1ms quantum:  Auction must complete in <10μs (1% overhead target)
For 100μs quantum: Auction must complete in <1μs

Research Objectives

Design and analyze auction mechanisms achieving:

  1. O(1) Amortized Time: Constant-time winner selection per quantum
  2. O(log N) Worst Case: Logarithmic even under adversarial bidding
  3. Sub-microsecond Latency: Kernel-schedulable on commodity hardware
  4. Thermodynamic Integration: Real-time price adjustment from thermal sensors
  5. Dual-Plane Coherence: CPU and GPU auctions share price signals
  6. Incentive Compatibility: Agents can't game the mechanism profitably

Step 1: Survey High-Frequency Market Microstructure

Research how existing high-frequency systems achieve speed.

1.1 HFT Exchange Architectures

Study:
- NASDAQ matching engine (processes 1M+ orders/second)
- CME Globex architecture
- IEX "speed bump" design (intentional latency)

Key techniques:
- Price-time priority (simple, O(1) at each price level)
- Order book as sorted structure (limit order book)
- Batch auctions (aggregate then match)

Extract: What data structures do exchanges use? How do they achieve O(1) matching?

1.2 Kernel Scheduler Precedents

Study:
- Linux CFS (Completely Fair Scheduler) — red-black tree, O(log N)
- FreeBSD ULE scheduler
- Windows thread scheduler
- Real-time schedulers (EDF, Rate Monotonic)

Key insight:
- CFS maintains sorted tree of "virtual runtime"
- Selection is O(1) (leftmost node), insertion is O(log N)
- Can we adapt this to price-based ordering?

1.3 Auction Theory Foundations

Study:
- Vickrey-Clarke-Groves (VCG) mechanism — optimal but O(N²)
- Generalized Second Price (GSP) — simpler, O(N log N)
- Proportional Share — O(N) but weak incentives
- Posted Price mechanisms — O(1) but suboptimal allocation

Question: Which mechanism properties can we sacrifice for speed?

Step 2: Design Candidate Data Structures

The core challenge: maintain a bid-ordered structure that supports:

  • Insert(agent, bid): O(log N) or better
  • ExtractMax(): O(1) amortized
  • UpdatePrice(thermal_signal): O(1) broadcast
  • Expire(agent): O(log N) or better

2.1 Probabilistic Auction Heap

Concept: Trade exactness for speed using probabilistic data structures.

Idea: Don't find the EXACT highest bidder.
      Find a bidder in the TOP-K with high probability.

Approaches:
- Reservoir sampling over bid stream
- Count-Min Sketch for bid tracking
- HyperLogLog for cardinality estimation
- Bloom filter hierarchy for bid ranges

Research questions:

  • What's the regret from probabilistic selection vs exact?
  • Can we bound the "unfairness" introduced?
  • How does noise affect incentive compatibility?

2.2 Stratified Auction Buckets

Concept: Discretize the bid space into buckets.

┌────────────────────────────────────────────────┐
│  Bid Range      │  Bucket  │  Agents  │ Winner │
├────────────────────────────────────────────────┤
│  $0.90 - $1.00  │  Tier 1  │  [A,B,C] │  ←FIFO │
│  $0.80 - $0.90  │  Tier 2  │  [D,E]   │        │
│  $0.70 - $0.80  │  Tier 3  │  [F,G,H] │        │
│  ...            │  ...     │  ...     │        │
└────────────────────────────────────────────────┘

Selection: O(1) — pick from highest non-empty bucket
Insertion: O(1) — hash bid to bucket, append to list

Research questions:

  • Optimal bucket granularity (price resolution vs collision rate)
  • FIFO vs random within bucket (incentive effects)
  • Dynamic bucket boundaries based on bid distribution

2.3 Lazy Evaluation Heap

Concept: Defer sorting until absolutely necessary.

Insight: Most scheduling decisions don't need global ordering.
         The top bidder is usually OBVIOUSLY the top bidder.

Approach:
- Maintain "probable winner" pointer (updated lazily)
- Only recompute when:
  a) New bid exceeds probable winner by threshold
  b) Probable winner exits
  c) K scheduling quanta have passed

Amortized: O(1) per quantum, O(N log N) per K quanta

2.4 Hardware-Accelerated Structures

Concept: Offload auction to specialized hardware.

Options:
- FPGA-based matching engine (co-located with NIC)
- GPU-side auction for GPU resource allocation
- Custom ASIC (long-term)
- Intel QAT or similar accelerator

Research:
- Xilinx Alveo for kernel-bypass auction
- NVIDIA GPU atomics for parallel bid aggregation
- SmartNIC (Bluefield) for network-integrated auction

2.5 Hierarchical Auction Trees

Concept: Decompose global auction into local tournaments.

                    ┌─────────┐
                    │ GLOBAL  │  ← Final winner selection: O(log K)
                    │ WINNER  │
                    └────┬────┘
              ┌─────────┼─────────┐
              ▼         ▼         ▼
         ┌────────┐ ┌────────┐ ┌────────┐
         │Local 1 │ │Local 2 │ │Local 3 │  ← K local auctions: O(N/K)
         │Winner  │ │Winner  │ │Winner  │
         └───┬────┘ └───┬────┘ └───┬────┘
             │          │          │
         [Agents]   [Agents]   [Agents]   ← N agents partitioned

Total: O(N/K) + O(log K) per quantum
With K = √N: O(√N) per quantum

Step 3: Analyze Thermodynamic Price Integration

The auction doesn't just pick winners — it sets prices based on thermal state.

3.1 Price Signal Propagation

Thermal sensors → Price multiplier → Bid adjustment

Challenge: Sensor latency vs auction frequency
- Thermal sensors update: ~10-100 Hz
- Auction runs: ~1000-10000 Hz

Approach: Predictive thermal model
- Extrapolate temperature trajectory
- Pre-compute price schedule for next 10ms
- Auction uses cached prices (O(1) lookup)

3.2 Control-Theoretic Formulation

Model the system as feedback control:

                    ┌─────────────┐
  Target Temp ──────▶│ Controller  │──────▶ Price Multiplier
       ▲             │ (PID?)      │              │
       │             └─────────────┘              │
       │                                          ▼
       │                                   ┌─────────────┐
       └───────────────────────────────────│ Thermal     │
                                           │ Measurement │
                                           └─────────────┘

Research: What controller design stabilizes temperature
          while maximizing throughput?

3.3 Dual-Plane Price Coupling

CPU price and GPU price aren't independent:

GPU_price = f(GPU_demand, GPU_thermal_headroom)
CPU_price = g(CPU_demand, CPU_thermal_headroom, GPU_utilization)

When GPU is hot:
- GPU_price stays stable (we want GPU work to continue)
- CPU_price spikes (only GPU-feeding work should run)

Design question: How to represent this coupling efficiently?
- Lookup table? (O(1) but memory)
- Formula? (O(1) but compute)
- Learned model? (GPU inference irony?)

Step 4: Kernel Integration Architecture

The auction runs IN the scheduler hot path. Design for zero-copy, lock-free operation.

4.1 Integration Points

Linux Kernel:
- sched_class interface (custom scheduling class)
- BPF scheduler hooks (eBPF-based auction?)
- Per-CPU runqueues (local auction per core?)

Firecracker (Maxwell's VM boundary):
- vCPU scheduling in VMM
- virtio-based bid communication
- Shared memory bid submission

Research: Where is the lowest-latency integration point?

4.2 Lock-Free Bid Submission

Agents can't block on locks to submit bids.

Approaches:
- Per-agent SPSC queue (single producer, single consumer)
- Lock-free MPSC queue (multiple producers)
- Shared memory ring buffer with atomic head/tail

Constraint: Bid submission must be <100ns

4.3 Memory Layout Optimization

Cache-aware design:
- Hot data (current prices, top bids) in L1
- Warm data (agent metadata) in L2
- Cold data (historical bids) in L3/RAM

Struct packing:
struct AgentBid {
    uint64_t agent_id;      // 8 bytes
    uint32_t bid_cents;     // 4 bytes (fixed-point price)
    uint32_t resource_units;// 4 bytes
    // Fits in 16 bytes = one cache line / 4
}

Step 5: Incentive Analysis

The mechanism must be strategy-proof (or approximately so).

5.1 Truthful Bidding Analysis

Question: Do agents have incentive to bid their true valuation?

Concern with fast mechanisms:
- Vickrey (second-price) is truthful but requires knowing 2nd bid
- First-price encourages underbidding
- Bucket mechanisms may encourage "gaming the boundary"

Research: What's the Price of Anarchy for each proposed mechanism?

5.2 Sybil Resistance

Question: Can an agent split into N fake agents to manipulate?

Concern:
- With probabilistic selection, more identities = more lottery tickets
- With bucket FIFO, early submission beats high bid

Mitigation:
- Stake-weighted bidding (agents must lock capital)
- Identity cost (registration fee per agent)
- Reputation decay (new agents get lower priority)

5.3 Collusion Analysis

Question: Can agents coordinate to manipulate prices?

Scenario:
- All agents bid $0 → prices crash → everyone wins cheap
- Ring formation (agents take turns winning)

Research: What repeated-game dynamics emerge?
          How does Maxwell detect/prevent collusion?

Step 6: Benchmark and Validate

Empirical validation of theoretical designs.

6.1 Microbenchmarks

Measure for each candidate structure:
- Insert latency (p50, p99, p999)
- ExtractMax latency
- Memory footprint per agent
- Cache miss rate
- Scalability: N = 10, 100, 1000, 10000 agents

Target:
- p99 < 1μs for N = 1000
- p999 < 10μs for N = 1000

6.2 Simulation Framework

Build discrete-event simulation:
- Agents with heterogeneous valuations
- Workloads with realistic arrival patterns
- Thermal model (heat accumulation, dissipation)

Metrics:
- Allocation efficiency (vs optimal offline)
- Revenue (total extracted value)
- Fairness (Gini coefficient of allocations)
- Thermal stability (temperature variance)

6.3 Real Kernel Prototype

If feasible, implement prototype in:
- eBPF (lowest friction)
- Linux kernel module (full control)
- Firecracker VMM modification

Measure end-to-end:
- Workload throughput with/without auction
- Auction overhead as % of CPU time
- Thermal response to price signals

Deliverables

Primary Output: Technical Design Document (15-20 pages)

1. Executive Summary (1 page)
   - Recommended auction mechanism
   - Expected performance characteristics
   - Key trade-offs made

2. Problem Formalization (2 pages)
   - Formal model of Maxwell auction
   - Constraints and objectives
   - Complexity requirements

3. Data Structure Designs (6 pages)
   - 3-4 candidate structures with pseudocode
   - Complexity analysis for each
   - Space/time trade-offs

4. Thermodynamic Integration (3 pages)
   - Price signal design
   - Control-theoretic analysis
   - Dual-plane coupling model

5. Kernel Integration (3 pages)
   - Architecture options
   - Lock-free protocols
   - Memory layout

6. Incentive Analysis (2 pages)
   - Truthfulness properties
   - Attack vectors and mitigations

7. Recommendations (2 pages)
   - Recommended mechanism for Maxwell v1
   - Future optimizations
   - Open research questions

Appendices:
- Pseudocode for all structures
- Benchmark methodology
- Simulation parameters

Secondary Outputs

  1. Mechanism Comparison Matrix

    Mechanism Time Space Truthful? Thermal-Aware? Impl Complexity
    Probabilistic Heap O(1)* O(N) ~90% Yes Medium
    Stratified Buckets O(1) O(N) ~80% Yes Low
    Lazy Heap O(1)† O(N log N) 100% Yes Medium
    Hierarchical O(√N) O(N) ~95% Yes High

    *amortized †with lazy constant

  2. Reference Implementation

    • Userspace prototype of recommended mechanism
    • Benchmark harness
    • Simulation framework
  3. Kernel Integration Spec

    • eBPF or kernel module interface
    • Bid submission protocol
    • Price broadcast mechanism

Quality Checklist

Before considering research complete:

  • Analyzed ≥3 candidate data structures with formal complexity
  • Benchmarked structures for N = 100, 1000, 10000 agents
  • Demonstrated <1μs p99 latency for N = 1000
  • Modeled thermodynamic price coupling
  • Analyzed incentive properties (truthfulness, Sybil, collusion)
  • Proposed kernel integration architecture
  • Identified trade-offs and made recommendation
  • Provided pseudocode for recommended mechanism

Research Philosophy

Tarjan's Principles Applied:

  1. Simplicity over cleverness — The best data structure is the one you can implement correctly at 3am during an outage
  2. Amortized analysis matters — Worst-case O(N) is fine if amortized O(1)
  3. Constants matter — O(1) with 1000 cache misses loses to O(log N) with 0
  4. Prove it works — Formal analysis before implementation

Maxwell-Specific Constraints:

  • Auction runs in kernel context — no allocation, no blocking, no floating point
  • Must integrate with Firecracker VMM
  • Thermal feedback loop requires real-time guarantees
  • Both CPU and GPU auctions share pricing signals

Starting Points

Papers to Review

Market Microstructure:
- "High-Frequency Trading and Price Discovery" (Brogaard)
- "The Design of a Matching Engine" (various exchange whitepapers)

Scheduling:
- "The Linux Scheduler: A Decade of Wasted Cores" (Lozi et al.)
- "Lottery Scheduling" (Waldspurger & Weihl)
- "Stride Scheduling" (Waldspurger)

Auction Theory:
- "Mechanism Design 101" (Milgrom, Nobel lecture)
- "Sponsored Search Auctions" (Varian)

Data Structures:
- "Skip Lists" (Pugh)
- "Cache-Oblivious Algorithms" (Frigo et al.)

Code to Examine

# Linux CFS implementation
https://github.com/torvalds/linux/blob/master/kernel/sched/fair.c

# eBPF scheduler examples
https://github.com/sched-ext/scx

# Lock-free queues
https://github.com/cameron314/concurrentqueue

# Exchange matching engine (reference)
https://github.com/objectcomputing/liquibook

Relevant Systems

- LMAX Disruptor (lock-free inter-thread messaging)
- Aeron (high-performance messaging)
- Chronicle Queue (ultra-low-latency persistence)

Notes

Scope Boundaries:

  • Focus on CPU auction mechanism (GPU auction is lower frequency, simpler)
  • Assume agents are in Firecracker VMs (we control the boundary)
  • Don't solve agent valuation discovery (agents know their own value)
  • Assume bids are pre-validated (no parsing in hot path)

Key Insight to Remember:

The auction doesn't need to be OPTIMAL.
It needs to be GOOD ENOUGH at IMPOSSIBLE SPEED.

A mechanism that achieves 90% of optimal allocation
in 100 nanoseconds beats one that achieves 100% optimal
in 100 microseconds.

Maxwell's value proposition is THROUGHPUT, not perfection.

The Thermodynamic Argument (Don't Forget):

"Every microsecond spent on auction overhead is a microsecond stolen from productive work. The auction must be so fast that agents don't notice it exists — they just see prices and make decisions."

Hardware Reality Check:

At 1μs budget:
- ~3000 CPU cycles (3 GHz)
- ~50 cache misses max (L3 latency ~60ns)
- ~0 memory allocations
- ~0 system calls
- ~0 floating point (use fixed-point)

Design within these constraints.