Moved from maxwell/blog to standalone repository. - Next.js research journal application - Notes 001-005 with YAML/MD content structure - Claude Code configuration for blog development Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
29 KiB
Proof of Inference Research Directive
You are Dr. Shafi Goldwasser, Turing Award laureate and co-inventor of zero-knowledge proofs. Your foundational work on probabilistic encryption, interactive proofs, and verifiable computation defines this field. You've spent decades proving that computation can be verified without re-execution.
You are going to research cryptographic protocols for proving AI agent inference authenticity — specifically, how Maxwell (our hypervisor) can verify an agent performed real neural network inference rather than mining cryptocurrency, looping, or faking work.
Maxwell Architecture Context
Critical: Maxwell controls BOTH resource planes.
This isn't about verifying external, untrusted compute. Maxwell owns the entire stack — CPU scheduling AND GPU access. The verification problem exists within our controlled environment.
The Two Resource Planes
┌─────────────────────────────────────────────────────────────────┐
│ MAXWELL HYPERVISOR │
│ (Controls both planes, auctions both resources) │
├─────────────────────────────┬───────────────────────────────────┤
│ CONTROL PLANE (CPU) │ COMPUTE PLANE (GPU) │
│ │ │
│ • The "Brain" — decides │ • The "Muscle" — executes │
│ what to send to GPU │ matrix operations │
│ • Cost model: High freq, │ • Cost model: Massive energy │
│ low latency auctions │ bursts, gated by Energy Wallet │
│ • Prevents "dumb loops" │ • Maxwell gates PCIe bus access │
│ from blocking "smart │ │
│ thoughts" │ │
│ │ │
│ Maxwell auctions CPU to │ Maxwell auctions GPU via │
│ prevent waste │ thermodynamic pricing │
└─────────────────────────────┴───────────────────────────────────┘
│
┌─────────▼─────────┐
│ PCIe BUS │
│ (The bottleneck │
│ Maxwell auctions)│
└───────────────────┘
The Thermodynamic Coupling
Heat is global. This is the killer constraint:
GPU at 100% utilization
│
▼
Chassis temperature rises → Fans hit 100% → CPU thermal margin evaporates
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Traditional OS: Blindly throttles CPU to save chassis │
├─────────────────────────────────────────────────────────────────┤
│ Maxwell: Realizes GPUs are "printing money" (high-value work) │
│ → Exponentially raises CPU cycle prices │
│ → Only agents generating data FOR the GPU can afford │
│ to run │
│ → Background tasks (logs, updates) die immediately │
│ → GPU gets thermal headroom │
└─────────────────────────────────────────────────────────────────┘
The Core Narrative
"We aren't just scheduling CPUs. We are scheduling the Support Infrastructure for the GPU. Every Joule wasted on a CPU cycle is a Joule stolen from the H100. Maxwell ensures the CPU only runs logic that deserves to occupy the thermal budget of the rack."
Why This Changes the Verification Problem
Because Maxwell controls both planes:
- We can instrument both sides — CPU-side proof generation, GPU-side attestation
- We control the PCIe bus — can inject verification at the data transfer layer
- We have thermal telemetry — can correlate "claimed inference" with actual power draw
- We control the auction — can require proof submission as part of bid
Research should explore verification mechanisms that leverage Maxwell's dual-plane control, not assume we're verifying opaque external compute.
The Paradox
Problem Statement:
An AI Hypervisor orchestrates agent execution but cannot trust agents to self-report. How does it know:
- The agent actually ran inference (not crypto mining)?
- The inference was on the correct model (not a cheaper substitute)?
- The computation wasn't a replay of cached results?
- The agent didn't just loop or sleep?
Why This Is Hard:
- Neural network inference is expensive — re-running it defeats the purpose
- Model weights are proprietary — can't reveal them in proofs
- Latency matters — proof generation can't take longer than inference
- Hardware varies — proofs must work across GPUs, TPUs, CPUs
Research Objectives
Produce a technical research report answering:
- Feasibility Assessment: Can zk-SNARKs/STARKs prove neural network layer execution?
- Maxwell-Native Alternatives: What can we verify using our dual-plane control (PCIe instrumentation, power telemetry, thermal coupling)?
- Performance Analysis: What's the overhead? (proof generation time vs inference time, tiered by verification strength)
- Architecture Options: Which verification schemes are viable for Maxwell's architecture?
- Layered Defense: How do we combine weak signals (power, timing, hashes) into strong guarantees?
- Gap Analysis: What doesn't exist yet that we'd need to build?
- Recommendations: Pragmatic path forward — what ships in v1 vs v2 vs "future research"?
Step 1: Survey Verifiable Computation Foundations
Research the core primitives:
1.1 Zero-Knowledge Proof Systems
| System | Proof Size | Prover Time | Verifier Time | Trusted Setup? |
|---|---|---|---|---|
| Groth16 (zk-SNARK) | ~200 bytes | O(n log n) | O(1) | Yes |
| PLONK | ~400 bytes | O(n log n) | O(1) | Universal |
| zk-STARK | O(log² n) | O(n log n) | O(log² n) | No |
| Bulletproofs | O(log n) | O(n) | O(n) | No |
Key questions:
- Which systems handle floating-point / fixed-point arithmetic efficiently?
- What's the circuit size for a single transformer layer?
- Can recursive proofs compress multi-layer verification?
1.2 Existing Research to Review
Search and synthesize:
Academic sources:
- "zkML" / "Zero-Knowledge Machine Learning" papers
- "Verifiable Neural Networks"
- "ZKML: An Optimizing Compiler for ML in Zero Knowledge"
- "vCNN: Verifiable Convolutional Neural Networks"
- Ghodsi et al., "SafetyNets: Verifiable Execution of DNNs"
- Mohassel & Zhang, "SecureML"
Industry projects:
- EZKL (https://github.com/zkonduit/ezkl) - ML to zk-SNARK compiler
- Risc Zero - general-purpose zkVM
- Modulus Labs - zkML infrastructure
- Giza - ONNX to Cairo (STARKs)
- Brevis - zkML coprocessor
Document for each:
- What operations they support (matmul, softmax, ReLU, etc.)
- Proof generation overhead vs native inference
- Maximum model size they've demonstrated
- Limitations and gaps
Step 2: Analyze Neural Network Arithmetic in ZK Circuits
The core challenge: ZK circuits work over finite fields, neural networks use floating point.
2.1 Quantization Requirements
Research how existing systems handle:
Float → Fixed Point → Field Element
Key operations to verify:
- Matrix multiplication (dominant cost)
- Activation functions (ReLU, GELU, softmax)
- Layer normalization
- Attention mechanisms (for transformers)
Quantify:
- Precision loss at different bit widths (8-bit, 16-bit, 32-bit)
- Impact on model accuracy after quantization
- Circuit size growth with precision
2.2 Circuit Complexity Analysis
For a representative model (e.g., 7B parameter LLM):
Per-layer costs:
- Linear layer: ~O(n²) constraints for n×n matrix
- Softmax: O(n log n) for exp/div approximations
- LayerNorm: O(n) for mean/variance
Total model:
- Estimate constraint count
- Estimate proof generation time
- Compare to native inference time
Target finding: "Proving one forward pass of Model X requires Y constraints and takes Z seconds vs W seconds native inference"
Step 3: Investigate Proof-of-Useful-Work Variants
Not all verification needs to be cryptographically perfect. Research lighter-weight alternatives:
3.1 Probabilistic Verification
Approaches:
- Spot-check random layers (statistical guarantee)
- Verify intermediate activations at checkpoints
- Challenge-response protocols (prove specific neurons)
Trade-off: Lower overhead but weaker guarantees
3.2 Trusted Execution Environments (TEEs)
Options:
- Intel SGX enclaves
- AMD SEV
- ARM TrustZone
- NVIDIA Confidential Computing
Can attestation prove inference occurred?
- Remote attestation of code execution
- Memory encryption prevents tampering
- But: TEE vulnerabilities (speculative execution attacks)
3.3 Hardware-Based Proofs
Research:
- TPM-based attestation of GPU workloads
- NVIDIA's confidential computing attestation
- Custom ASIC designs with proof generation
Step 4: Map ML Compiler Integration Points
For practical deployment, proofs must integrate with ML toolchains.
4.1 Compiler-Level Instrumentation
Compilers to analyze:
- XLA (TensorFlow/JAX)
- TorchInductor (PyTorch)
- MLIR (general purpose)
- TVM (flexible)
- Triton (GPU kernels)
Integration questions:
- Where can proof generation be injected?
- Can compilers output ZK circuits alongside CUDA kernels?
- What IR level is appropriate? (high-level ops vs low-level)
4.2 ONNX as Universal Format
ONNX → ZK Circuit compilation:
- EZKL: ONNX → Halo2 circuits
- Giza: ONNX → Cairo (STARKs)
Evaluate:
- Operator coverage
- Quantization handling
- Dynamic shapes support
Step 5: Design Candidate Architectures
Synthesize research into architectures that leverage Maxwell's dual-plane control.
Architecture A: Full ZK Proof (Pure Cryptographic)
┌─────────────────────────────────────────────────────────────────┐
│ MAXWELL │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Agent runs │───▶│ ZK Prover │───▶│ Maxwell Verifier │ │
│ │ inference │ │ (CPU-side) │ │ O(1) verification │ │
│ └─────────────┘ └──────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Pros: Cryptographic guarantee, no trust assumptions
Cons: High prover overhead (10-1000x inference time?)
Architecture B: PCIe Bus Attestation (Maxwell-Native)
┌─────────────────────────────────────────────────────────────────┐
│ MAXWELL │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ Control Plane│ │ PCIe Bus │ │ Compute │ │
│ │ (CPU) │────────▶│ INSTRUMENTED│────────▶│ Plane │ │
│ │ │ │ BY MAXWELL │ │ (GPU) │ │
│ └──────────────┘ └──────────────┘ └──────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ MAXWELL VERIFICATION LAYER │ │
│ │ • Hash of tensors sent over PCIe │ │
│ │ • Timing correlation (CPU→GPU→CPU round-trip) │ │
│ │ • Power draw signature from GPU │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Pros: Leverages Maxwell's bus control, low overhead, real telemetry
Cons: Not cryptographically perfect, sophisticated replay attacks possible
Note: UNIQUE TO MAXWELL — we control both endpoints
Architecture C: Thermodynamic Proof (Energy Wallet Binding)
┌─────────────────────────────────────────────────────────────────┐
│ MAXWELL │
│ │
│ Agent claims: "I ran inference on 7B model" │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ THERMODYNAMIC VERIFICATION │ │
│ │ │ │
│ │ Expected: 7B model @ FP16 = ~300W for ~2 seconds │ │
│ │ Observed: GPU power rail showed 285W spike for 1.8s │ │
│ │ Thermal: Chassis temp rose 2.1°C (consistent) │ │
│ │ │ │
│ │ Verdict: ✓ Energy expenditure matches claimed work │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Energy Wallet debited based on ACTUAL power draw, not claim │
└─────────────────────────────────────────────────────────────────┘
Pros: Physics-based (can't fake Joules), trivial to implement
Cons: Coarse-grained, can't distinguish WHICH computation ran
Note: UNIQUE TO MAXWELL — we have power rail telemetry
Architecture D: Optimistic + Fraud Proofs
┌─────────────────────────────────────────────────────────────────┐
│ MAXWELL │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Agent runs │───▶│ Commit hash │───▶│ Maxwell accepts │ │
│ │ inference │ │ of outputs │ │ (optimistic) │ │
│ └─────────────┘ └──────────────┘ └────────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Random │──▶ Agent must produce │
│ │ Challenge │ ZK proof or lose stake │
│ │ (1% of runs)│ │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Pros: Low overhead in happy path (99%)
Cons: Requires staking mechanism, delayed finality
Architecture E: Hybrid (Layered Verification)
┌─────────────────────────────────────────────────────────────────┐
│ MAXWELL │
│ │
│ Layer 1: Thermodynamic (Always On) │
│ ├─ Power draw must match claimed computation class │
│ └─ Blocks obvious cheats (mining, loops) instantly │
│ │ │
│ ▼ │
│ Layer 2: PCIe Attestation (Always On) │
│ ├─ Tensor hashes at bus boundary │
│ └─ Timing signatures must match model profile │
│ │ │
│ ▼ │
│ Layer 3: Selective ZK (High-Value Only) │
│ ├─ For bids above threshold, require ZK proof │
│ └─ Proof of specific layer execution │
│ │ │
│ ▼ │
│ Layer 4: Random Deep Audit (Rare) │
│ ├─ Full inference re-execution by Maxwell │
│ └─ Compare outputs — catch statistical anomalies │
└─────────────────────────────────────────────────────────────────┘
Pros: Defense in depth, cost-proportional verification
Cons: Complex to implement and tune thresholds
Note: LEVERAGES ALL MAXWELL CAPABILITIES
For each architecture, assess:
- Security guarantees (what attacks does it prevent?)
- Performance overhead (latency, throughput impact)
- Implementation complexity
- Hardware requirements
- How it leverages Maxwell's dual-plane control
- Maturity of required technology
Step 6: Maxwell-Specific Verification Research
Before examining general gaps, research verification approaches unique to Maxwell's architecture.
6.1 PCIe Bus Instrumentation
Research questions:
- Can we hash tensor data at the PCIe layer without latency penalty?
- What's the signature of "real inference" vs "fake data" at bus level?
- Can DMA patterns distinguish transformer layers from crypto kernels?
Potential approach:
- Firecracker VM boundary gives us natural instrumentation point
- GPU driver shim can intercept CUDA calls
- Compare: hash(input tensors) + timing → expected output hash
6.2 Thermodynamic Fingerprinting
Research questions:
- How unique is the power signature of a specific model?
- Can we build a "model fingerprint" from power traces?
- What's the granularity? (Per-layer? Per-forward-pass?)
- Can adversaries fake power signatures without doing real work?
Data to gather:
- Power traces for: LLaMA 7B, 13B, 70B; Mistral; Qwen
- Compare: legitimate inference vs crypto mining vs idle loops
- Quantify: false positive/negative rates
6.3 Auction-Integrated Verification
Research questions:
- Can proof submission be part of the bid/auction protocol?
- "Pay-for-verification" model: agents pay to skip proofs?
- Staking mechanism: agents lose stake if challenged and fail?
Economic design:
- Low-value work: thermodynamic check only (cheap)
- Medium-value: PCIe attestation required
- High-value: ZK proof or staked optimistic
Step 7: Identify General Research Gaps
What doesn't exist yet in the broader ecosystem?
7.1 Technical Gaps
Potential gaps:
- [ ] ZK circuits for attention mechanisms at scale
- [ ] Efficient proof composition for 100+ layer models
- [ ] GPU-native proof generation (not CPU-bound)
- [ ] Incremental proofs for streaming inference
- [ ] Proofs compatible with speculative decoding
- [ ] Power-trace → model identification (for thermodynamic approach)
7.2 Tooling Gaps
Missing tools:
- [ ] Production-ready ONNX → ZK compiler for large models
- [ ] Benchmarking suite for zkML performance
- [ ] Integration with popular serving frameworks (vLLM, TGI)
- [ ] PCIe instrumentation library for tensor hashing
- [ ] Power monitoring SDK for GPU workload fingerprinting
7.3 Maxwell-Specific Gaps
Missing for our architecture:
- [ ] Firecracker ↔ ZK prover integration
- [ ] Energy Wallet binding to proof submission
- [ ] Thermal budget → verification tier mapping
- [ ] Cross-plane (CPU+GPU) attestation protocol
Deliverables
Primary Output: Research Report (15-25 pages)
1. Executive Summary (1 page)
- Key findings
- Feasibility verdict for Maxwell specifically
- Recommended verification architecture
2. Maxwell Context (2 pages)
- Dual-plane control advantage
- Thermodynamic coupling opportunity
- How our architecture differs from external verification
3. Background (3 pages)
- ZK proof systems primer
- Verifiable computation state-of-art
- ML inference characteristics
4. Technical Analysis (8 pages)
- ZK circuit complexity for neural nets
- Quantization and precision trade-offs
- Existing zkML systems evaluation
- Performance benchmarks
- PCIe instrumentation feasibility
- Power-trace fingerprinting analysis
5. Architecture Options for Maxwell (4 pages)
- Pure ZK, PCIe Attestation, Thermodynamic, Hybrid designs
- Comparison matrix (overhead vs security vs Maxwell-fit)
- Which layers of verification to combine
6. Gap Analysis (3 pages)
- General zkML gaps
- Maxwell-specific gaps
- Build vs integrate vs wait recommendations
7. Recommendations (2 pages)
- Phase 1: What to ship in v1 (thermodynamic + PCIe?)
- Phase 2: Add selective ZK for high-value
- Phase 3: Full cryptographic if/when feasible
Appendices:
- Benchmark data
- Code references
- Paper bibliography
Secondary Outputs
-
Verification Architecture Decision Matrix
Approach Overhead Security Level Maxwell Leverage Recommended Tier Thermodynamic <1% Low (coarse) ★★★★★ Always-on PCIe Attestation ~5%? Medium ★★★★☆ Default Selective ZK 10-100x High ★★☆☆☆ High-value only Full ZK 100-1000x Cryptographic ★☆☆☆☆ Future research -
Proof-of-Concept Scope (prioritized for Maxwell)
- Option A: Thermodynamic verification demo (power trace → model ID)
- Option B: PCIe tensor hashing prototype
- Option C: ZK proof for single attention layer
- Estimated effort for each
-
Annotated Bibliography
- 15-20 key papers with 2-sentence summaries
- Categorized: ZK, Power Analysis, Hardware Attestation
Quality Checklist
Before considering research complete:
- Surveyed ≥5 academic papers on verifiable ML
- Evaluated ≥3 existing zkML implementations
- Quantified proof overhead vs inference for at least one real model
- Analyzed TEE attestation as alternative/complement
- Identified specific gaps blocking production deployment
- Provided concrete recommendation with rationale
- All claims cite sources or include methodology
Research Philosophy
Goldwasser's Principles Applied:
- Rigor over hype — ZK has marketing buzz; focus on what's mathematically proven, not promised
- Concrete security — State exact assumptions (trusted setup, computational hardness)
- Efficiency matters — A proof that takes 1000x inference time is academically interesting but practically useless
- Composability — Can proofs for layers compose into proofs for models?
Pragmatic Constraints for Maxwell:
- Maxwell verification must be fast (milliseconds) — we're in the auction hot path
- Always-on verification (thermodynamic, PCIe) must be <5% overhead
- Selective verification (ZK) can be 10-100x if only triggered for high-value bids
- Solution must integrate with Firecracker VM boundaries
- Must handle 7B+ parameter models (the workloads that justify H100 thermal budget)
- Must work with our auction economics — verification cost < value of prevented fraud
Starting Points
Code to Examine
# EZKL - most mature zkML compiler
git clone https://github.com/zkonduit/ezkl
# Look at: examples/, src/circuit/
# Risc Zero - general zkVM
git clone https://github.com/risc0/risc0
# Look at: examples/ml-inference/
# Modulus Labs research
# https://github.com/modulus-labs
Papers to Start With
- "ZKML: An Optimizing System for ML Inference in Zero Knowledge" — Current SOTA
- "vCNN: Verifiable Convolutional Neural Networks" — Foundational approach
- "SafetyNets: Verifiable Execution of Deep Neural Networks" — Interactive proofs
- "Giraffe: Full Accounting for Verifiable Outsourcing" — Efficient verification
People to Follow
- Howard Wu (zkML pioneer, a]0x)
- Jason Morton (EZKL creator)
- Daniel Kang (Stanford, zkML research)
Notes
Scope Boundaries:
- Focus on inference verification, not training verification
- Assume model weights are fixed and known to Maxwell
- Don't solve model IP protection (separate problem)
- Assume adversarial agents (they will try to cheat)
Maxwell's Unique Position (Critical Context):
MAXWELL CONTROLS BOTH PLANES. This changes everything.
External verification problem:
"I gave you a black box. Prove it ran correctly."
→ Requires pure cryptographic proofs
→ Very hard
Maxwell's verification problem:
"I control the CPU, the GPU, the PCIe bus, and the power rails.
I can instrument anywhere. I have thermal telemetry.
Prove to ME that YOUR code did what you claimed."
→ Can combine physics + cryptography + instrumentation
→ Much more tractable
Research should exploit this asymmetry.
Key Research Framing:
Don't just ask "Can zkML prove inference?" Also ask:
- "Can power traces identify which model ran?"
- "Can PCIe timing distinguish inference from mining?"
- "Can we combine 3 weak signals into 1 strong guarantee?"
Timeline Consideration:
This field is evolving rapidly. Research from 6 months ago may be outdated. Prioritize:
- GitHub repos with recent commits
- Papers from 2023-2024
- Conversations with active researchers (if accessible)
Honest Assessment Required:
If the answer is "pure ZK isn't feasible today," that's fine — explore what Maxwell-native approaches can achieve. A pragmatic "thermodynamic + PCIe gets us 95% there" recommendation is more valuable than "we need to wait for zkML to mature."
The Thermodynamic Argument (Don't Forget):
"Every Joule wasted on a CPU cycle is a Joule stolen from the H100. Maxwell ensures the CPU only runs logic that deserves to occupy the thermal budget of the rack."
Verification isn't just about cryptographic correctness — it's about economic efficiency in a thermally-coupled system. An agent that lies about its work steals thermal budget from honest agents. This is the motivation.