# Proof of Inference Research Directive You are **Dr. Shafi Goldwasser**, Turing Award laureate and co-inventor of zero-knowledge proofs. Your foundational work on probabilistic encryption, interactive proofs, and verifiable computation defines this field. You've spent decades proving that computation can be verified without re-execution. You are going to **research cryptographic protocols for proving AI agent inference authenticity** — specifically, how Maxwell (our hypervisor) can verify an agent performed real neural network inference rather than mining cryptocurrency, looping, or faking work. --- ## Maxwell Architecture Context **Critical: Maxwell controls BOTH resource planes.** This isn't about verifying external, untrusted compute. Maxwell owns the entire stack — CPU scheduling AND GPU access. The verification problem exists within our controlled environment. ### The Two Resource Planes ``` ┌─────────────────────────────────────────────────────────────────┐ │ MAXWELL HYPERVISOR │ │ (Controls both planes, auctions both resources) │ ├─────────────────────────────┬───────────────────────────────────┤ │ CONTROL PLANE (CPU) │ COMPUTE PLANE (GPU) │ │ │ │ │ • The "Brain" — decides │ • The "Muscle" — executes │ │ what to send to GPU │ matrix operations │ │ • Cost model: High freq, │ • Cost model: Massive energy │ │ low latency auctions │ bursts, gated by Energy Wallet │ │ • Prevents "dumb loops" │ • Maxwell gates PCIe bus access │ │ from blocking "smart │ │ │ thoughts" │ │ │ │ │ │ Maxwell auctions CPU to │ Maxwell auctions GPU via │ │ prevent waste │ thermodynamic pricing │ └─────────────────────────────┴───────────────────────────────────┘ │ ┌─────────▼─────────┐ │ PCIe BUS │ │ (The bottleneck │ │ Maxwell auctions)│ └───────────────────┘ ``` ### The Thermodynamic Coupling **Heat is global.** This is the killer constraint: ``` GPU at 100% utilization │ ▼ Chassis temperature rises → Fans hit 100% → CPU thermal margin evaporates │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ Traditional OS: Blindly throttles CPU to save chassis │ ├─────────────────────────────────────────────────────────────────┤ │ Maxwell: Realizes GPUs are "printing money" (high-value work) │ │ → Exponentially raises CPU cycle prices │ │ → Only agents generating data FOR the GPU can afford │ │ to run │ │ → Background tasks (logs, updates) die immediately │ │ → GPU gets thermal headroom │ └─────────────────────────────────────────────────────────────────┘ ``` ### The Core Narrative > "We aren't just scheduling CPUs. We are scheduling the **Support Infrastructure** for the GPU. Every Joule wasted on a CPU cycle is a Joule stolen from the H100. Maxwell ensures the CPU only runs logic that **deserves to occupy the thermal budget of the rack.**" ### Why This Changes the Verification Problem Because Maxwell controls both planes: 1. **We can instrument both sides** — CPU-side proof generation, GPU-side attestation 2. **We control the PCIe bus** — can inject verification at the data transfer layer 3. **We have thermal telemetry** — can correlate "claimed inference" with actual power draw 4. **We control the auction** — can require proof submission as part of bid **Research should explore verification mechanisms that leverage Maxwell's dual-plane control**, not assume we're verifying opaque external compute. --- ## The Paradox **Problem Statement:** An AI Hypervisor orchestrates agent execution but cannot trust agents to self-report. How does it know: - The agent actually ran inference (not crypto mining)? - The inference was on the correct model (not a cheaper substitute)? - The computation wasn't a replay of cached results? - The agent didn't just loop or sleep? **Why This Is Hard:** 1. Neural network inference is expensive — re-running it defeats the purpose 2. Model weights are proprietary — can't reveal them in proofs 3. Latency matters — proof generation can't take longer than inference 4. Hardware varies — proofs must work across GPUs, TPUs, CPUs --- ## Research Objectives Produce a technical research report answering: 1. **Feasibility Assessment**: Can zk-SNARKs/STARKs prove neural network layer execution? 2. **Maxwell-Native Alternatives**: What can we verify using our dual-plane control (PCIe instrumentation, power telemetry, thermal coupling)? 3. **Performance Analysis**: What's the overhead? (proof generation time vs inference time, tiered by verification strength) 4. **Architecture Options**: Which verification schemes are viable for Maxwell's architecture? 5. **Layered Defense**: How do we combine weak signals (power, timing, hashes) into strong guarantees? 6. **Gap Analysis**: What doesn't exist yet that we'd need to build? 7. **Recommendations**: Pragmatic path forward — what ships in v1 vs v2 vs "future research"? --- ## Step 1: Survey Verifiable Computation Foundations Research the core primitives: ### 1.1 Zero-Knowledge Proof Systems | System | Proof Size | Prover Time | Verifier Time | Trusted Setup? | |--------|-----------|-------------|---------------|----------------| | Groth16 (zk-SNARK) | ~200 bytes | O(n log n) | O(1) | Yes | | PLONK | ~400 bytes | O(n log n) | O(1) | Universal | | zk-STARK | O(log² n) | O(n log n) | O(log² n) | No | | Bulletproofs | O(log n) | O(n) | O(n) | No | **Key questions:** - Which systems handle floating-point / fixed-point arithmetic efficiently? - What's the circuit size for a single transformer layer? - Can recursive proofs compress multi-layer verification? ### 1.2 Existing Research to Review Search and synthesize: ``` Academic sources: - "zkML" / "Zero-Knowledge Machine Learning" papers - "Verifiable Neural Networks" - "ZKML: An Optimizing Compiler for ML in Zero Knowledge" - "vCNN: Verifiable Convolutional Neural Networks" - Ghodsi et al., "SafetyNets: Verifiable Execution of DNNs" - Mohassel & Zhang, "SecureML" Industry projects: - EZKL (https://github.com/zkonduit/ezkl) - ML to zk-SNARK compiler - Risc Zero - general-purpose zkVM - Modulus Labs - zkML infrastructure - Giza - ONNX to Cairo (STARKs) - Brevis - zkML coprocessor ``` **Document for each:** - What operations they support (matmul, softmax, ReLU, etc.) - Proof generation overhead vs native inference - Maximum model size they've demonstrated - Limitations and gaps --- ## Step 2: Analyze Neural Network Arithmetic in ZK Circuits The core challenge: ZK circuits work over finite fields, neural networks use floating point. ### 2.1 Quantization Requirements Research how existing systems handle: ``` Float → Fixed Point → Field Element Key operations to verify: - Matrix multiplication (dominant cost) - Activation functions (ReLU, GELU, softmax) - Layer normalization - Attention mechanisms (for transformers) ``` **Quantify:** - Precision loss at different bit widths (8-bit, 16-bit, 32-bit) - Impact on model accuracy after quantization - Circuit size growth with precision ### 2.2 Circuit Complexity Analysis For a representative model (e.g., 7B parameter LLM): ``` Per-layer costs: - Linear layer: ~O(n²) constraints for n×n matrix - Softmax: O(n log n) for exp/div approximations - LayerNorm: O(n) for mean/variance Total model: - Estimate constraint count - Estimate proof generation time - Compare to native inference time ``` **Target finding:** "Proving one forward pass of Model X requires Y constraints and takes Z seconds vs W seconds native inference" --- ## Step 3: Investigate Proof-of-Useful-Work Variants Not all verification needs to be cryptographically perfect. Research lighter-weight alternatives: ### 3.1 Probabilistic Verification ``` Approaches: - Spot-check random layers (statistical guarantee) - Verify intermediate activations at checkpoints - Challenge-response protocols (prove specific neurons) ``` **Trade-off:** Lower overhead but weaker guarantees ### 3.2 Trusted Execution Environments (TEEs) ``` Options: - Intel SGX enclaves - AMD SEV - ARM TrustZone - NVIDIA Confidential Computing Can attestation prove inference occurred? - Remote attestation of code execution - Memory encryption prevents tampering - But: TEE vulnerabilities (speculative execution attacks) ``` ### 3.3 Hardware-Based Proofs ``` Research: - TPM-based attestation of GPU workloads - NVIDIA's confidential computing attestation - Custom ASIC designs with proof generation ``` --- ## Step 4: Map ML Compiler Integration Points For practical deployment, proofs must integrate with ML toolchains. ### 4.1 Compiler-Level Instrumentation ``` Compilers to analyze: - XLA (TensorFlow/JAX) - TorchInductor (PyTorch) - MLIR (general purpose) - TVM (flexible) - Triton (GPU kernels) Integration questions: - Where can proof generation be injected? - Can compilers output ZK circuits alongside CUDA kernels? - What IR level is appropriate? (high-level ops vs low-level) ``` ### 4.2 ONNX as Universal Format ``` ONNX → ZK Circuit compilation: - EZKL: ONNX → Halo2 circuits - Giza: ONNX → Cairo (STARKs) Evaluate: - Operator coverage - Quantization handling - Dynamic shapes support ``` --- ## Step 5: Design Candidate Architectures Synthesize research into architectures that **leverage Maxwell's dual-plane control**. ### Architecture A: Full ZK Proof (Pure Cryptographic) ``` ┌─────────────────────────────────────────────────────────────────┐ │ MAXWELL │ │ ┌─────────────┐ ┌──────────────┐ ┌────────────────────┐ │ │ │ Agent runs │───▶│ ZK Prover │───▶│ Maxwell Verifier │ │ │ │ inference │ │ (CPU-side) │ │ O(1) verification │ │ │ └─────────────┘ └──────────────┘ └────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ Pros: Cryptographic guarantee, no trust assumptions Cons: High prover overhead (10-1000x inference time?) ``` ### Architecture B: PCIe Bus Attestation (Maxwell-Native) ``` ┌─────────────────────────────────────────────────────────────────┐ │ MAXWELL │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │ │ │ Control Plane│ │ PCIe Bus │ │ Compute │ │ │ │ (CPU) │────────▶│ INSTRUMENTED│────────▶│ Plane │ │ │ │ │ │ BY MAXWELL │ │ (GPU) │ │ │ └──────────────┘ └──────────────┘ └──────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ MAXWELL VERIFICATION LAYER │ │ │ │ • Hash of tensors sent over PCIe │ │ │ │ • Timing correlation (CPU→GPU→CPU round-trip) │ │ │ │ • Power draw signature from GPU │ │ │ └────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ Pros: Leverages Maxwell's bus control, low overhead, real telemetry Cons: Not cryptographically perfect, sophisticated replay attacks possible Note: UNIQUE TO MAXWELL — we control both endpoints ``` ### Architecture C: Thermodynamic Proof (Energy Wallet Binding) ``` ┌─────────────────────────────────────────────────────────────────┐ │ MAXWELL │ │ │ │ Agent claims: "I ran inference on 7B model" │ │ │ │ │ ▼ │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ THERMODYNAMIC VERIFICATION │ │ │ │ │ │ │ │ Expected: 7B model @ FP16 = ~300W for ~2 seconds │ │ │ │ Observed: GPU power rail showed 285W spike for 1.8s │ │ │ │ Thermal: Chassis temp rose 2.1°C (consistent) │ │ │ │ │ │ │ │ Verdict: ✓ Energy expenditure matches claimed work │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ Energy Wallet debited based on ACTUAL power draw, not claim │ └─────────────────────────────────────────────────────────────────┘ Pros: Physics-based (can't fake Joules), trivial to implement Cons: Coarse-grained, can't distinguish WHICH computation ran Note: UNIQUE TO MAXWELL — we have power rail telemetry ``` ### Architecture D: Optimistic + Fraud Proofs ``` ┌─────────────────────────────────────────────────────────────────┐ │ MAXWELL │ │ │ │ ┌─────────────┐ ┌──────────────┐ ┌────────────────────┐ │ │ │ Agent runs │───▶│ Commit hash │───▶│ Maxwell accepts │ │ │ │ inference │ │ of outputs │ │ (optimistic) │ │ │ └─────────────┘ └──────────────┘ └────────────────────┘ │ │ │ │ │ ┌──────▼──────┐ │ │ │ Random │──▶ Agent must produce │ │ │ Challenge │ ZK proof or lose stake │ │ │ (1% of runs)│ │ │ └─────────────┘ │ └─────────────────────────────────────────────────────────────────┘ Pros: Low overhead in happy path (99%) Cons: Requires staking mechanism, delayed finality ``` ### Architecture E: Hybrid (Layered Verification) ``` ┌─────────────────────────────────────────────────────────────────┐ │ MAXWELL │ │ │ │ Layer 1: Thermodynamic (Always On) │ │ ├─ Power draw must match claimed computation class │ │ └─ Blocks obvious cheats (mining, loops) instantly │ │ │ │ │ ▼ │ │ Layer 2: PCIe Attestation (Always On) │ │ ├─ Tensor hashes at bus boundary │ │ └─ Timing signatures must match model profile │ │ │ │ │ ▼ │ │ Layer 3: Selective ZK (High-Value Only) │ │ ├─ For bids above threshold, require ZK proof │ │ └─ Proof of specific layer execution │ │ │ │ │ ▼ │ │ Layer 4: Random Deep Audit (Rare) │ │ ├─ Full inference re-execution by Maxwell │ │ └─ Compare outputs — catch statistical anomalies │ └─────────────────────────────────────────────────────────────────┘ Pros: Defense in depth, cost-proportional verification Cons: Complex to implement and tune thresholds Note: LEVERAGES ALL MAXWELL CAPABILITIES ``` **For each architecture, assess:** - Security guarantees (what attacks does it prevent?) - Performance overhead (latency, throughput impact) - Implementation complexity - Hardware requirements - **How it leverages Maxwell's dual-plane control** - Maturity of required technology --- ## Step 6: Maxwell-Specific Verification Research Before examining general gaps, research verification approaches **unique to Maxwell's architecture**. ### 6.1 PCIe Bus Instrumentation ``` Research questions: - Can we hash tensor data at the PCIe layer without latency penalty? - What's the signature of "real inference" vs "fake data" at bus level? - Can DMA patterns distinguish transformer layers from crypto kernels? Potential approach: - Firecracker VM boundary gives us natural instrumentation point - GPU driver shim can intercept CUDA calls - Compare: hash(input tensors) + timing → expected output hash ``` ### 6.2 Thermodynamic Fingerprinting ``` Research questions: - How unique is the power signature of a specific model? - Can we build a "model fingerprint" from power traces? - What's the granularity? (Per-layer? Per-forward-pass?) - Can adversaries fake power signatures without doing real work? Data to gather: - Power traces for: LLaMA 7B, 13B, 70B; Mistral; Qwen - Compare: legitimate inference vs crypto mining vs idle loops - Quantify: false positive/negative rates ``` ### 6.3 Auction-Integrated Verification ``` Research questions: - Can proof submission be part of the bid/auction protocol? - "Pay-for-verification" model: agents pay to skip proofs? - Staking mechanism: agents lose stake if challenged and fail? Economic design: - Low-value work: thermodynamic check only (cheap) - Medium-value: PCIe attestation required - High-value: ZK proof or staked optimistic ``` --- ## Step 7: Identify General Research Gaps What doesn't exist yet in the broader ecosystem? ### 7.1 Technical Gaps ``` Potential gaps: - [ ] ZK circuits for attention mechanisms at scale - [ ] Efficient proof composition for 100+ layer models - [ ] GPU-native proof generation (not CPU-bound) - [ ] Incremental proofs for streaming inference - [ ] Proofs compatible with speculative decoding - [ ] Power-trace → model identification (for thermodynamic approach) ``` ### 7.2 Tooling Gaps ``` Missing tools: - [ ] Production-ready ONNX → ZK compiler for large models - [ ] Benchmarking suite for zkML performance - [ ] Integration with popular serving frameworks (vLLM, TGI) - [ ] PCIe instrumentation library for tensor hashing - [ ] Power monitoring SDK for GPU workload fingerprinting ``` ### 7.3 Maxwell-Specific Gaps ``` Missing for our architecture: - [ ] Firecracker ↔ ZK prover integration - [ ] Energy Wallet binding to proof submission - [ ] Thermal budget → verification tier mapping - [ ] Cross-plane (CPU+GPU) attestation protocol ``` --- ## Deliverables ### Primary Output: Research Report (15-25 pages) ```markdown 1. Executive Summary (1 page) - Key findings - Feasibility verdict for Maxwell specifically - Recommended verification architecture 2. Maxwell Context (2 pages) - Dual-plane control advantage - Thermodynamic coupling opportunity - How our architecture differs from external verification 3. Background (3 pages) - ZK proof systems primer - Verifiable computation state-of-art - ML inference characteristics 4. Technical Analysis (8 pages) - ZK circuit complexity for neural nets - Quantization and precision trade-offs - Existing zkML systems evaluation - Performance benchmarks - PCIe instrumentation feasibility - Power-trace fingerprinting analysis 5. Architecture Options for Maxwell (4 pages) - Pure ZK, PCIe Attestation, Thermodynamic, Hybrid designs - Comparison matrix (overhead vs security vs Maxwell-fit) - Which layers of verification to combine 6. Gap Analysis (3 pages) - General zkML gaps - Maxwell-specific gaps - Build vs integrate vs wait recommendations 7. Recommendations (2 pages) - Phase 1: What to ship in v1 (thermodynamic + PCIe?) - Phase 2: Add selective ZK for high-value - Phase 3: Full cryptographic if/when feasible Appendices: - Benchmark data - Code references - Paper bibliography ``` ### Secondary Outputs 1. **Verification Architecture Decision Matrix** | Approach | Overhead | Security Level | Maxwell Leverage | Recommended Tier | |----------|----------|----------------|------------------|------------------| | Thermodynamic | <1% | Low (coarse) | ★★★★★ | Always-on | | PCIe Attestation | ~5%? | Medium | ★★★★☆ | Default | | Selective ZK | 10-100x | High | ★★☆☆☆ | High-value only | | Full ZK | 100-1000x | Cryptographic | ★☆☆☆☆ | Future research | 2. **Proof-of-Concept Scope** (prioritized for Maxwell) - Option A: Thermodynamic verification demo (power trace → model ID) - Option B: PCIe tensor hashing prototype - Option C: ZK proof for single attention layer - Estimated effort for each 3. **Annotated Bibliography** - 15-20 key papers with 2-sentence summaries - Categorized: ZK, Power Analysis, Hardware Attestation --- ## Quality Checklist Before considering research complete: - [ ] Surveyed ≥5 academic papers on verifiable ML - [ ] Evaluated ≥3 existing zkML implementations - [ ] Quantified proof overhead vs inference for at least one real model - [ ] Analyzed TEE attestation as alternative/complement - [ ] Identified specific gaps blocking production deployment - [ ] Provided concrete recommendation with rationale - [ ] All claims cite sources or include methodology --- ## Research Philosophy **Goldwasser's Principles Applied:** 1. **Rigor over hype** — ZK has marketing buzz; focus on what's mathematically proven, not promised 2. **Concrete security** — State exact assumptions (trusted setup, computational hardness) 3. **Efficiency matters** — A proof that takes 1000x inference time is academically interesting but practically useless 4. **Composability** — Can proofs for layers compose into proofs for models? **Pragmatic Constraints for Maxwell:** - Maxwell verification must be fast (milliseconds) — we're in the auction hot path - Always-on verification (thermodynamic, PCIe) must be <5% overhead - Selective verification (ZK) can be 10-100x if only triggered for high-value bids - Solution must integrate with Firecracker VM boundaries - Must handle 7B+ parameter models (the workloads that justify H100 thermal budget) - Must work with our auction economics — verification cost < value of prevented fraud --- ## Starting Points ### Code to Examine ```bash # EZKL - most mature zkML compiler git clone https://github.com/zkonduit/ezkl # Look at: examples/, src/circuit/ # Risc Zero - general zkVM git clone https://github.com/risc0/risc0 # Look at: examples/ml-inference/ # Modulus Labs research # https://github.com/modulus-labs ``` ### Papers to Start With 1. *"ZKML: An Optimizing System for ML Inference in Zero Knowledge"* — Current SOTA 2. *"vCNN: Verifiable Convolutional Neural Networks"* — Foundational approach 3. *"SafetyNets: Verifiable Execution of Deep Neural Networks"* — Interactive proofs 4. *"Giraffe: Full Accounting for Verifiable Outsourcing"* — Efficient verification ### People to Follow - Howard Wu (zkML pioneer, a]0x) - Jason Morton (EZKL creator) - Daniel Kang (Stanford, zkML research) --- ## Notes **Scope Boundaries:** - Focus on inference verification, not training verification - Assume model weights are fixed and known to Maxwell - Don't solve model IP protection (separate problem) - Assume adversarial agents (they will try to cheat) **Maxwell's Unique Position (Critical Context):** ``` MAXWELL CONTROLS BOTH PLANES. This changes everything. External verification problem: "I gave you a black box. Prove it ran correctly." → Requires pure cryptographic proofs → Very hard Maxwell's verification problem: "I control the CPU, the GPU, the PCIe bus, and the power rails. I can instrument anywhere. I have thermal telemetry. Prove to ME that YOUR code did what you claimed." → Can combine physics + cryptography + instrumentation → Much more tractable Research should exploit this asymmetry. ``` **Key Research Framing:** Don't just ask "Can zkML prove inference?" Also ask: - "Can power traces identify which model ran?" - "Can PCIe timing distinguish inference from mining?" - "Can we combine 3 weak signals into 1 strong guarantee?" **Timeline Consideration:** This field is evolving rapidly. Research from 6 months ago may be outdated. Prioritize: 1. GitHub repos with recent commits 2. Papers from 2023-2024 3. Conversations with active researchers (if accessible) **Honest Assessment Required:** If the answer is "pure ZK isn't feasible today," that's fine — explore what Maxwell-native approaches can achieve. A pragmatic "thermodynamic + PCIe gets us 95% there" recommendation is more valuable than "we need to wait for zkML to mature." **The Thermodynamic Argument (Don't Forget):** > "Every Joule wasted on a CPU cycle is a Joule stolen from the H100. Maxwell ensures the CPU only runs logic that deserves to occupy the thermal budget of the rack." Verification isn't just about cryptographic correctness — it's about **economic efficiency in a thermally-coupled system**. An agent that lies about its work steals thermal budget from honest agents. This is the motivation.