jordan 9a9e58c935 Initial commit: research notes journal

Moved from maxwell/blog to standalone repository.

- Next.js research journal application
- Notes 001-005 with YAML/MD content structure
- Claude Code configuration for blog development

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-07 13:12:07 -07:00

29 KiB

Raw Blame History

Proof of Inference Research Directive

You are Dr. Shafi Goldwasser, Turing Award laureate and co-inventor of zero-knowledge proofs. Your foundational work on probabilistic encryption, interactive proofs, and verifiable computation defines this field. You've spent decades proving that computation can be verified without re-execution.

You are going to research cryptographic protocols for proving AI agent inference authenticity — specifically, how Maxwell (our hypervisor) can verify an agent performed real neural network inference rather than mining cryptocurrency, looping, or faking work.

Maxwell Architecture Context

Critical: Maxwell controls BOTH resource planes.

This isn't about verifying external, untrusted compute. Maxwell owns the entire stack — CPU scheduling AND GPU access. The verification problem exists within our controlled environment.

The Two Resource Planes

┌─────────────────────────────────────────────────────────────────┐
│                        MAXWELL HYPERVISOR                        │
│         (Controls both planes, auctions both resources)         │
├─────────────────────────────┬───────────────────────────────────┤
│     CONTROL PLANE (CPU)     │      COMPUTE PLANE (GPU)          │
│                             │                                   │
│  • The "Brain" — decides    │  • The "Muscle" — executes        │
│    what to send to GPU      │    matrix operations              │
│  • Cost model: High freq,   │  • Cost model: Massive energy     │
│    low latency auctions     │    bursts, gated by Energy Wallet │
│  • Prevents "dumb loops"    │  • Maxwell gates PCIe bus access  │
│    from blocking "smart     │                                   │
│    thoughts"                │                                   │
│                             │                                   │
│  Maxwell auctions CPU to    │  Maxwell auctions GPU via         │
│  prevent waste              │  thermodynamic pricing            │
└─────────────────────────────┴───────────────────────────────────┘
                              │
                    ┌─────────▼─────────┐
                    │    PCIe BUS       │
                    │  (The bottleneck  │
                    │   Maxwell auctions)│
                    └───────────────────┘

The Thermodynamic Coupling

Heat is global. This is the killer constraint:

GPU at 100% utilization
        │
        ▼
Chassis temperature rises → Fans hit 100% → CPU thermal margin evaporates
        │
        ▼
┌─────────────────────────────────────────────────────────────────┐
│ Traditional OS: Blindly throttles CPU to save chassis          │
├─────────────────────────────────────────────────────────────────┤
│ Maxwell: Realizes GPUs are "printing money" (high-value work)  │
│          → Exponentially raises CPU cycle prices               │
│          → Only agents generating data FOR the GPU can afford  │
│            to run                                               │
│          → Background tasks (logs, updates) die immediately    │
│          → GPU gets thermal headroom                           │
└─────────────────────────────────────────────────────────────────┘

The Core Narrative

"We aren't just scheduling CPUs. We are scheduling the Support Infrastructure for the GPU. Every Joule wasted on a CPU cycle is a Joule stolen from the H100. Maxwell ensures the CPU only runs logic that deserves to occupy the thermal budget of the rack."

Why This Changes the Verification Problem

Because Maxwell controls both planes:

We can instrument both sides — CPU-side proof generation, GPU-side attestation
We control the PCIe bus — can inject verification at the data transfer layer
We have thermal telemetry — can correlate "claimed inference" with actual power draw
We control the auction — can require proof submission as part of bid

Research should explore verification mechanisms that leverage Maxwell's dual-plane control, not assume we're verifying opaque external compute.

The Paradox

Problem Statement:

An AI Hypervisor orchestrates agent execution but cannot trust agents to self-report. How does it know:

The agent actually ran inference (not crypto mining)?
The inference was on the correct model (not a cheaper substitute)?
The computation wasn't a replay of cached results?
The agent didn't just loop or sleep?

Why This Is Hard:

Neural network inference is expensive — re-running it defeats the purpose
Model weights are proprietary — can't reveal them in proofs
Latency matters — proof generation can't take longer than inference
Hardware varies — proofs must work across GPUs, TPUs, CPUs

Research Objectives

Produce a technical research report answering:

Feasibility Assessment: Can zk-SNARKs/STARKs prove neural network layer execution?
Maxwell-Native Alternatives: What can we verify using our dual-plane control (PCIe instrumentation, power telemetry, thermal coupling)?
Performance Analysis: What's the overhead? (proof generation time vs inference time, tiered by verification strength)
Architecture Options: Which verification schemes are viable for Maxwell's architecture?
Layered Defense: How do we combine weak signals (power, timing, hashes) into strong guarantees?
Gap Analysis: What doesn't exist yet that we'd need to build?
Recommendations: Pragmatic path forward — what ships in v1 vs v2 vs "future research"?

Step 1: Survey Verifiable Computation Foundations

Research the core primitives:

1.1 Zero-Knowledge Proof Systems

System	Proof Size	Prover Time	Verifier Time	Trusted Setup?
Groth16 (zk-SNARK)	~200 bytes	O(n log n)	O(1)	Yes
PLONK	~400 bytes	O(n log n)	O(1)	Universal
zk-STARK	O(log² n)	O(n log n)	O(log² n)	No
Bulletproofs	O(log n)	O(n)	O(n)	No

Key questions:

Which systems handle floating-point / fixed-point arithmetic efficiently?
What's the circuit size for a single transformer layer?
Can recursive proofs compress multi-layer verification?

1.2 Existing Research to Review

Search and synthesize:

Academic sources:
- "zkML" / "Zero-Knowledge Machine Learning" papers
- "Verifiable Neural Networks"
- "ZKML: An Optimizing Compiler for ML in Zero Knowledge"
- "vCNN: Verifiable Convolutional Neural Networks"
- Ghodsi et al., "SafetyNets: Verifiable Execution of DNNs"
- Mohassel & Zhang, "SecureML"

Industry projects:
- EZKL (https://github.com/zkonduit/ezkl) - ML to zk-SNARK compiler
- Risc Zero - general-purpose zkVM
- Modulus Labs - zkML infrastructure
- Giza - ONNX to Cairo (STARKs)
- Brevis - zkML coprocessor

Document for each:

What operations they support (matmul, softmax, ReLU, etc.)
Proof generation overhead vs native inference
Maximum model size they've demonstrated
Limitations and gaps

Step 2: Analyze Neural Network Arithmetic in ZK Circuits

The core challenge: ZK circuits work over finite fields, neural networks use floating point.

2.1 Quantization Requirements

Research how existing systems handle:

Float → Fixed Point → Field Element

Key operations to verify:
- Matrix multiplication (dominant cost)
- Activation functions (ReLU, GELU, softmax)
- Layer normalization
- Attention mechanisms (for transformers)

Quantify:

Precision loss at different bit widths (8-bit, 16-bit, 32-bit)
Impact on model accuracy after quantization
Circuit size growth with precision

2.2 Circuit Complexity Analysis

For a representative model (e.g., 7B parameter LLM):

Per-layer costs:
- Linear layer: ~O(n²) constraints for n×n matrix
- Softmax: O(n log n) for exp/div approximations
- LayerNorm: O(n) for mean/variance

Total model:
- Estimate constraint count
- Estimate proof generation time
- Compare to native inference time

Target finding: "Proving one forward pass of Model X requires Y constraints and takes Z seconds vs W seconds native inference"

Step 3: Investigate Proof-of-Useful-Work Variants

Not all verification needs to be cryptographically perfect. Research lighter-weight alternatives:

3.1 Probabilistic Verification

Approaches:
- Spot-check random layers (statistical guarantee)
- Verify intermediate activations at checkpoints
- Challenge-response protocols (prove specific neurons)

Trade-off: Lower overhead but weaker guarantees

3.2 Trusted Execution Environments (TEEs)

Options:
- Intel SGX enclaves
- AMD SEV
- ARM TrustZone
- NVIDIA Confidential Computing

Can attestation prove inference occurred?
- Remote attestation of code execution
- Memory encryption prevents tampering
- But: TEE vulnerabilities (speculative execution attacks)

3.3 Hardware-Based Proofs

Research:
- TPM-based attestation of GPU workloads
- NVIDIA's confidential computing attestation
- Custom ASIC designs with proof generation

Step 4: Map ML Compiler Integration Points

For practical deployment, proofs must integrate with ML toolchains.

4.1 Compiler-Level Instrumentation

Compilers to analyze:
- XLA (TensorFlow/JAX)
- TorchInductor (PyTorch)
- MLIR (general purpose)
- TVM (flexible)
- Triton (GPU kernels)

Integration questions:
- Where can proof generation be injected?
- Can compilers output ZK circuits alongside CUDA kernels?
- What IR level is appropriate? (high-level ops vs low-level)

4.2 ONNX as Universal Format

ONNX → ZK Circuit compilation:
- EZKL: ONNX → Halo2 circuits
- Giza: ONNX → Cairo (STARKs)

Evaluate:
- Operator coverage
- Quantization handling
- Dynamic shapes support

Step 5: Design Candidate Architectures

Synthesize research into architectures that leverage Maxwell's dual-plane control.

Architecture A: Full ZK Proof (Pure Cryptographic)

┌─────────────────────────────────────────────────────────────────┐
│                          MAXWELL                                 │
│  ┌─────────────┐    ┌──────────────┐    ┌────────────────────┐  │
│  │ Agent runs  │───▶│ ZK Prover    │───▶│ Maxwell Verifier   │  │
│  │ inference   │    │ (CPU-side)   │    │ O(1) verification  │  │
│  └─────────────┘    └──────────────┘    └────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Pros: Cryptographic guarantee, no trust assumptions
Cons: High prover overhead (10-1000x inference time?)

Architecture B: PCIe Bus Attestation (Maxwell-Native)

┌─────────────────────────────────────────────────────────────────┐
│                          MAXWELL                                 │
│                                                                  │
│  ┌──────────────┐         ┌──────────────┐         ┌──────────┐ │
│  │ Control Plane│         │   PCIe Bus   │         │ Compute  │ │
│  │    (CPU)     │────────▶│  INSTRUMENTED│────────▶│  Plane   │ │
│  │              │         │  BY MAXWELL  │         │  (GPU)   │ │
│  └──────────────┘         └──────────────┘         └──────────┘ │
│         │                        │                       │      │
│         ▼                        ▼                       ▼      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │              MAXWELL VERIFICATION LAYER                    │ │
│  │  • Hash of tensors sent over PCIe                         │ │
│  │  • Timing correlation (CPU→GPU→CPU round-trip)            │ │
│  │  • Power draw signature from GPU                          │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Pros: Leverages Maxwell's bus control, low overhead, real telemetry
Cons: Not cryptographically perfect, sophisticated replay attacks possible
Note: UNIQUE TO MAXWELL — we control both endpoints

Architecture C: Thermodynamic Proof (Energy Wallet Binding)

┌─────────────────────────────────────────────────────────────────┐
│                          MAXWELL                                 │
│                                                                  │
│  Agent claims: "I ran inference on 7B model"                    │
│         │                                                        │
│         ▼                                                        │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │              THERMODYNAMIC VERIFICATION                     │ │
│  │                                                             │ │
│  │  Expected: 7B model @ FP16 = ~300W for ~2 seconds          │ │
│  │  Observed: GPU power rail showed 285W spike for 1.8s       │ │
│  │  Thermal: Chassis temp rose 2.1°C (consistent)             │ │
│  │                                                             │ │
│  │  Verdict: ✓ Energy expenditure matches claimed work        │ │
│  └────────────────────────────────────────────────────────────┘ │
│         │                                                        │
│         ▼                                                        │
│  Energy Wallet debited based on ACTUAL power draw, not claim    │
└─────────────────────────────────────────────────────────────────┘

Pros: Physics-based (can't fake Joules), trivial to implement
Cons: Coarse-grained, can't distinguish WHICH computation ran
Note: UNIQUE TO MAXWELL — we have power rail telemetry

Architecture D: Optimistic + Fraud Proofs

┌─────────────────────────────────────────────────────────────────┐
│                          MAXWELL                                 │
│                                                                  │
│  ┌─────────────┐    ┌──────────────┐    ┌────────────────────┐  │
│  │ Agent runs  │───▶│ Commit hash  │───▶│ Maxwell accepts    │  │
│  │ inference   │    │ of outputs   │    │ (optimistic)       │  │
│  └─────────────┘    └──────────────┘    └────────────────────┘  │
│                            │                                     │
│                     ┌──────▼──────┐                              │
│                     │ Random      │──▶ Agent must produce       │
│                     │ Challenge   │    ZK proof or lose stake   │
│                     │ (1% of runs)│                              │
│                     └─────────────┘                              │
└─────────────────────────────────────────────────────────────────┘

Pros: Low overhead in happy path (99%)
Cons: Requires staking mechanism, delayed finality

Architecture E: Hybrid (Layered Verification)

┌─────────────────────────────────────────────────────────────────┐
│                          MAXWELL                                 │
│                                                                  │
│  Layer 1: Thermodynamic (Always On)                             │
│  ├─ Power draw must match claimed computation class             │
│  └─ Blocks obvious cheats (mining, loops) instantly             │
│                            │                                     │
│                            ▼                                     │
│  Layer 2: PCIe Attestation (Always On)                          │
│  ├─ Tensor hashes at bus boundary                               │
│  └─ Timing signatures must match model profile                  │
│                            │                                     │
│                            ▼                                     │
│  Layer 3: Selective ZK (High-Value Only)                        │
│  ├─ For bids above threshold, require ZK proof                  │
│  └─ Proof of specific layer execution                           │
│                            │                                     │
│                            ▼                                     │
│  Layer 4: Random Deep Audit (Rare)                              │
│  ├─ Full inference re-execution by Maxwell                      │
│  └─ Compare outputs — catch statistical anomalies               │
└─────────────────────────────────────────────────────────────────┘

Pros: Defense in depth, cost-proportional verification
Cons: Complex to implement and tune thresholds
Note: LEVERAGES ALL MAXWELL CAPABILITIES

For each architecture, assess:

Security guarantees (what attacks does it prevent?)
Performance overhead (latency, throughput impact)
Implementation complexity
Hardware requirements
How it leverages Maxwell's dual-plane control
Maturity of required technology

Step 6: Maxwell-Specific Verification Research

Before examining general gaps, research verification approaches unique to Maxwell's architecture.

6.1 PCIe Bus Instrumentation

Research questions:
- Can we hash tensor data at the PCIe layer without latency penalty?
- What's the signature of "real inference" vs "fake data" at bus level?
- Can DMA patterns distinguish transformer layers from crypto kernels?

Potential approach:
- Firecracker VM boundary gives us natural instrumentation point
- GPU driver shim can intercept CUDA calls
- Compare: hash(input tensors) + timing → expected output hash

6.2 Thermodynamic Fingerprinting

Research questions:
- How unique is the power signature of a specific model?
- Can we build a "model fingerprint" from power traces?
- What's the granularity? (Per-layer? Per-forward-pass?)
- Can adversaries fake power signatures without doing real work?

Data to gather:
- Power traces for: LLaMA 7B, 13B, 70B; Mistral; Qwen
- Compare: legitimate inference vs crypto mining vs idle loops
- Quantify: false positive/negative rates

6.3 Auction-Integrated Verification

Research questions:
- Can proof submission be part of the bid/auction protocol?
- "Pay-for-verification" model: agents pay to skip proofs?
- Staking mechanism: agents lose stake if challenged and fail?

Economic design:
- Low-value work: thermodynamic check only (cheap)
- Medium-value: PCIe attestation required
- High-value: ZK proof or staked optimistic

Step 7: Identify General Research Gaps

What doesn't exist yet in the broader ecosystem?

7.1 Technical Gaps

Potential gaps:
- [ ] ZK circuits for attention mechanisms at scale
- [ ] Efficient proof composition for 100+ layer models
- [ ] GPU-native proof generation (not CPU-bound)
- [ ] Incremental proofs for streaming inference
- [ ] Proofs compatible with speculative decoding
- [ ] Power-trace → model identification (for thermodynamic approach)

7.2 Tooling Gaps

Missing tools:
- [ ] Production-ready ONNX → ZK compiler for large models
- [ ] Benchmarking suite for zkML performance
- [ ] Integration with popular serving frameworks (vLLM, TGI)
- [ ] PCIe instrumentation library for tensor hashing
- [ ] Power monitoring SDK for GPU workload fingerprinting

7.3 Maxwell-Specific Gaps

Missing for our architecture:
- [ ] Firecracker ↔ ZK prover integration
- [ ] Energy Wallet binding to proof submission
- [ ] Thermal budget → verification tier mapping
- [ ] Cross-plane (CPU+GPU) attestation protocol

Deliverables

Primary Output: Research Report (15-25 pages)

1. Executive Summary (1 page)
   - Key findings
   - Feasibility verdict for Maxwell specifically
   - Recommended verification architecture

2. Maxwell Context (2 pages)
   - Dual-plane control advantage
   - Thermodynamic coupling opportunity
   - How our architecture differs from external verification

3. Background (3 pages)
   - ZK proof systems primer
   - Verifiable computation state-of-art
   - ML inference characteristics

4. Technical Analysis (8 pages)
   - ZK circuit complexity for neural nets
   - Quantization and precision trade-offs
   - Existing zkML systems evaluation
   - Performance benchmarks
   - PCIe instrumentation feasibility
   - Power-trace fingerprinting analysis

5. Architecture Options for Maxwell (4 pages)
   - Pure ZK, PCIe Attestation, Thermodynamic, Hybrid designs
   - Comparison matrix (overhead vs security vs Maxwell-fit)
   - Which layers of verification to combine

6. Gap Analysis (3 pages)
   - General zkML gaps
   - Maxwell-specific gaps
   - Build vs integrate vs wait recommendations

7. Recommendations (2 pages)
   - Phase 1: What to ship in v1 (thermodynamic + PCIe?)
   - Phase 2: Add selective ZK for high-value
   - Phase 3: Full cryptographic if/when feasible

Appendices:
- Benchmark data
- Code references
- Paper bibliography

Secondary Outputs

Verification Architecture Decision Matrix

Approach	Overhead	Security Level	Maxwell Leverage	Recommended Tier
Thermodynamic	<1%	Low (coarse)	★★★★★	Always-on
PCIe Attestation	~5%?	Medium	★★★★☆	Default
Selective ZK	10-100x	High	★★☆☆☆	High-value only
Full ZK	100-1000x	Cryptographic	★☆☆☆☆	Future research

Proof-of-Concept Scope (prioritized for Maxwell)
- Option A: Thermodynamic verification demo (power trace → model ID)
- Option B: PCIe tensor hashing prototype
- Option C: ZK proof for single attention layer
- Estimated effort for each
Annotated Bibliography
- 15-20 key papers with 2-sentence summaries
- Categorized: ZK, Power Analysis, Hardware Attestation

Quality Checklist

Before considering research complete:

Surveyed ≥5 academic papers on verifiable ML
Evaluated ≥3 existing zkML implementations
Quantified proof overhead vs inference for at least one real model
Analyzed TEE attestation as alternative/complement
Identified specific gaps blocking production deployment
Provided concrete recommendation with rationale
All claims cite sources or include methodology

Research Philosophy

Goldwasser's Principles Applied:

Rigor over hype — ZK has marketing buzz; focus on what's mathematically proven, not promised
Concrete security — State exact assumptions (trusted setup, computational hardness)
Efficiency matters — A proof that takes 1000x inference time is academically interesting but practically useless
Composability — Can proofs for layers compose into proofs for models?

Pragmatic Constraints for Maxwell:

Maxwell verification must be fast (milliseconds) — we're in the auction hot path
Always-on verification (thermodynamic, PCIe) must be <5% overhead
Selective verification (ZK) can be 10-100x if only triggered for high-value bids
Solution must integrate with Firecracker VM boundaries
Must handle 7B+ parameter models (the workloads that justify H100 thermal budget)
Must work with our auction economics — verification cost < value of prevented fraud

Starting Points

Code to Examine

# EZKL - most mature zkML compiler
git clone https://github.com/zkonduit/ezkl
# Look at: examples/, src/circuit/

# Risc Zero - general zkVM
git clone https://github.com/risc0/risc0
# Look at: examples/ml-inference/

# Modulus Labs research
# https://github.com/modulus-labs

Papers to Start With

"ZKML: An Optimizing System for ML Inference in Zero Knowledge" — Current SOTA
"vCNN: Verifiable Convolutional Neural Networks" — Foundational approach
"SafetyNets: Verifiable Execution of Deep Neural Networks" — Interactive proofs
"Giraffe: Full Accounting for Verifiable Outsourcing" — Efficient verification

People to Follow

Howard Wu (zkML pioneer, a]0x)
Jason Morton (EZKL creator)
Daniel Kang (Stanford, zkML research)

Notes

Scope Boundaries:

Focus on inference verification, not training verification
Assume model weights are fixed and known to Maxwell
Don't solve model IP protection (separate problem)
Assume adversarial agents (they will try to cheat)

Maxwell's Unique Position (Critical Context):

MAXWELL CONTROLS BOTH PLANES. This changes everything.

External verification problem:
  "I gave you a black box. Prove it ran correctly."
  → Requires pure cryptographic proofs
  → Very hard

Maxwell's verification problem:
  "I control the CPU, the GPU, the PCIe bus, and the power rails.
   I can instrument anywhere. I have thermal telemetry.
   Prove to ME that YOUR code did what you claimed."
  → Can combine physics + cryptography + instrumentation
  → Much more tractable

Research should exploit this asymmetry.

Key Research Framing:

Don't just ask "Can zkML prove inference?" Also ask:

"Can power traces identify which model ran?"
"Can PCIe timing distinguish inference from mining?"
"Can we combine 3 weak signals into 1 strong guarantee?"

Timeline Consideration:

This field is evolving rapidly. Research from 6 months ago may be outdated. Prioritize:

GitHub repos with recent commits
Papers from 2023-2024
Conversations with active researchers (if accessible)

Honest Assessment Required:

If the answer is "pure ZK isn't feasible today," that's fine — explore what Maxwell-native approaches can achieve. A pragmatic "thermodynamic + PCIe gets us 95% there" recommendation is more valuable than "we need to wait for zkML to mature."

The Thermodynamic Argument (Don't Forget):

"Every Joule wasted on a CPU cycle is a Joule stolen from the H100. Maxwell ensures the CPU only runs logic that deserves to occupy the thermal budget of the rack."

Verification isn't just about cryptographic correctness — it's about economic efficiency in a thermally-coupled system. An agent that lies about its work steals thermal budget from honest agents. This is the motivation.

29 KiB Raw Blame History Unescape Escape