Axiom Math at The Montgomery Summit | Fireside Chat

If You Like Our Meta-Quantum.Today, Please Send us your email.

Introduction

This fireside chat features Carina Hong, founder and CEO of Axiom, in conversation with Matt Mcllwain, Managing Director at Madrona Venture Group (an investor in Axiom since day one), at The Montgomery Summit. Carina shares her journey from southern China — where she taught herself English to read the Graduate Texts in Mathematics (GTM) — through MIT (math and physics), Oxford as a Rhodes Scholar, and Stanford, where she dropped out of her concurrent PhD and law degree to start Axiom. Inspired by mathematical geniuses like Galois (who died at 21) and Ramanujan (who died around 30 from malnutrition and ill health), Carina is motivated by a haunting question: what fundamental breakthroughs has humanity missed because we are bounded by the scarcity of outlier reasoning minds? Axiom’s mission is to build superhuman AI reasoners — starting with the world’s first AI Mathematician.

About Axiom and Core Concept

The AI Mathematician

Axiom is building a general AI mathematician — not just a model that solves a narrow class of math problems, but a system that reasons across mathematics broadly. The historical trajectory is striking:

  • 2016: Google began applying deep neural networks to mathematics (pre-Transformer, pre-ChatGPT).
  • 2019: François Charton and Guillaume Lample (Mistral co-founder) showed Transformers could outperform Mathematica on symbolic integration. Charton is now full-time technical staff at Axiom.
  • December 2024 (Putnam Exam): Axiom Prover scored a perfect 120/120 on the Putnam Competition — an exam where the median score among 150,000 math majors is zero, and only five humans have scored a perfect 120 in 98 years. The top human this year (an MIT undergraduate) scored 110. The best LLM (DeepSeek by Mass Arena) scored 103.
  • Late January 2025: Axiom Prover independently solved four open research conjectures sent by professors from Tacoma Israeli University, Boston College, and Williams College — problems the professors themselves could not solve. Reported by Will Knight in WIRED.

Verified AI — The Real Product Insight

The crucial differentiator is not just answer-generation but verifiable correctness. Axiom Prover outputs computer programs for mathematical proofs (in Lean 4). If the program compiles and you see a check mark, the logic is 100% verified — you don’t need to be a math professor to trust the output. This positions Axiom not just as a math tool but as the foundation for verification across the entire generative-AI economy (code verification, chip verification, agent verification).

Axiom Math — Complete Guide: Products, Installation, Setup, Configuration & Prover Use Cases

1. Overview — What is Axiom Math?

Axiom Math (axiommath.ai) is the Silicon Valley company founded by Karina Hong (featured in the Montgomery Summit fireside chat) building AI mathematicians and verified-AI infrastructure. They ship two distinct products developers should know about:

ProductWhat it isAccess
AxiomProverTheir flagship autonomous multi-agent ensemble theorem prover for Lean 4 — the system that scored 12/12 on Putnam 2025Currently internal / partnership-based (not yet a public API). Solutions are open-sourced on GitHub.
AXLE (Axiom Lean Engine)Public API + Python SDK + CLI exposing the proof verification and manipulation primitives Axiom uses internally to train AxiomProverPublic release — sign up at the console for API key

AXLE is what you can install and use today. AxiomProver is what uses AXLE under the hood.

2. AXLE — The Axiom Lean Engine

AXLE provides interactive tools for exploring, validating, and manipulating Lean 4 mathematical proofs. According to the AXLE homepage, “AXLE provides proof verification and manipulation primitives we’ve used across all of our research efforts, including training AI models and AxiomProver’s 12/12 on Putnam 2025.”

Core toolset (all available via Python, CLI, and HTTP API)

  • verify_proof — validate a candidate Lean theorem against a formal statement
  • check — check if Lean code is valid
  • extract_decls / extract_theorems — extract declarations from Lean source
  • rename — rename declarations
  • theorem2lemma / theorem2sorry / sorry2lemma — proof-structure transformations
  • merge — merge proof files
  • simplify_theorems — simplify theorem statements
  • repair_proofs — auto-repair broken proofs
  • have2lemma / have2sorry — convert local have statements
  • disprove — counter-example tooling
  • normalize — canonicalization

3. Installation

Requirements

  • Python 3.11+
  • Internet connection (to reach the AXLE API — the Lean compilation runs server-side)

Option A — Install from PyPI (recommended)

pip install axiom-axle

Option B — Install from Source

git clone <https://github.com/AxiomMath/axiom-lean-engine.git>
cd axiom-lean-engine
pip install -e .

Or directly via SSH:

pip install git+ssh://git@github.com/AxiomMath/axiom-lean-engine

Option C — Development Installation (with dev tools + pre-commit hooks)

git clone <https://github.com/AxiomMath/axiom-lean-engine.git>
cd axiom-lean-engine
make setup-env

Verify the install

# CLI check
axle --version

# Python import check
python -c "from axle import AxleClient; print('OK')"

4. Setup — API Key & Authentication

AXLE uses API key auth for rate limiting. Without a key, you get 10 concurrent active requests max.

Step 1: Get an API key

Visit the console: **https://axle.axiommath.ai/app/console**

Step 2: Set it as an environment variable (recommended)

export AXLE_API_KEY=your-api-key

Add to ~/.bashrc, ~/.zshrc, or your project’s .env for persistence.

Step 3 (optional): Pass directly in code

from axle import AxleClient
client = AxleClient(api_key="your-api-key")

5. Configuration

Environment variables

VariableDefaultDescription
AXLE_API_KEYAPI key for authentication
AXLE_API_URLhttps://axle.axiommath.aiAPI server URL
AXLE_TIMEOUT_SECONDS1800Base retry-window timeout
AXLE_MAX_CONCURRENCY20Max concurrent requests
export AXLE_API_KEY=your-api-key
export AXLE_API_URL=https://axle.axiommath.ai
export AXLE_TIMEOUT_SECONDS=600
export AXLE_MAX_CONCURRENCY=50

Python client configuration

from axle import AxleClient

# Custom URL, timeout, and concurrency
client = AxleClient(
    api_key="your-api-key",
    url=AxleClient.DEFAULT_URL,
    base_timeout_seconds=600,
    max_concurrency=50,
)

CLI configuration (global options)

# Custom API URL
axle --url <https://axle.axiommath.ai> check file.lean --environment lean-4.28.0

# JSON output
axle --json check file.lean --environment lean-4.28.0

# Output to file
axle theorem2sorry input.lean --environment lean-4.28.0 -o output.lean

Lean environments

Every API request requires an environment parameter. AXLE supports multiple Lean versions with pre-built dependencies (typically Mathlib). At time of writing, the recommended starter is lean-4.28.0.

Discover available environments:

# CLI
axle environments

# Python
async with AxleClient() as client:
    envs = await client.environments()
    for env in envs:
        print(f"{env.name}: {env.description}")

# HTTP
curl -s <https://axle.axiommath.ai/v1/environments> | jq

Example environment definitions:

[
  {
    "name": "lean-4.21.0",
    "lean_toolchain": "leanprover/lean4:v4.21.0",
    "imports": "import Mathlib",
    "description": "Lean 4.21.0 with Mathlib"
  },
  {
    "name": "pnt-4.26.0",
    "lean_toolchain": "leanprover/lean4:v4.26.0",
    "repo_url": "<https://github.com/AlexKontorovich/PrimeNumberTheoremAnd>",
    "imports": "import Mathlib\\nimport PrimeNumberTheoremAnd",
    "description": "Lean + Mathlib 4.26.0 with Terence Tao's Prime Number Theorem Project"
  }
]

Custom environments can pin a specific GitHub repo + revision + subdir — useful for working with particular formalization projects.

6. Quick Start — First “Hello, Theorem” Check

Python

import asyncio
from axle import AxleClient

async def main():
    async with AxleClient() as client:
        result = await client.check(
            content="import Mathlib\\ntheorem citation_needed : 1 + 1 = 2 := by decide",
            environment="lean-4.28.0",
        )
        print(f"Valid: {result.okay}")
        if result.lean_messages.errors:
            print("Errors:", result.lean_messages.errors)

asyncio.run(main())

CLI

# From file
axle check mytheorem.lean --environment lean-4.28.0

# From stdin
echo "def meaning_of_life := 42\\n#print meaning_of_life" \\
  | axle check - --environment lean-4.28.0

HTTP

curl -s -X POST <https://axle.axiommath.ai/api/v1/check> \\
    -H "Authorization: Bearer $AXLE_API_KEY" \\
    -H "Content-Type: application/json" \\
    -d '{
      "content": "import Mathlib\\ntheorem citation_needed : 1 + 1 = 2 := by decide",
      "environment": "lean-4.28.0"
    }' | jq

Try it in-browser first: an interactive Colab demo notebook is linked from the AXLE Quick Start page — handy for kicking the tires before touching pip.

7. The Core Use Case — verify_proof (the Prover Verification Workflow)

This is the heart of the verified-AI loop Karina described in the fireside chat. The flow:

  1. You (or your AI agent) write a formal statement with sorry placeholders — defining what must be proved.
  2. You produce a candidate proof — the actual Lean code attempting the proof.
  3. AXLE compiles both, checks signatures match, runs the kernel, and tells you whether the proof is 100% valid.

Conceptual example

-- formal_statement: defines the theorem signature
import Mathlib
theorem add_comm (a b : Nat) : a + b = b + a := by sorry
-- content: provides the actual proof
import Mathlib
theorem add_comm (a b : Nat) : a + b = b + a := Nat.add_comm a b

Python API

result = await axle.verify_proof(
    formal_statement="import Mathlib\\ntheorem citation_needed : 1 = 1 := by sorry",
    content="import Mathlib\\ntheorem citation_needed : 1 = 1 := rfl",
    environment="lean-4.28.0",
    permitted_sorries=["helper"],  # Optional: helper lemmas allowed to remain unproven
    mathlib_options=False,          # Optional: enable conventional Mathlib options
    ignore_imports=False,           # Optional: handle import mismatches
    timeout_seconds=120,            # Optional: max 900s (15 min) for non-admin
)

print(result.okay)         # True if proof is valid
print(result.content)      # The processed Lean code

CLI

# Basic
axle verify-proof statement.lean proof.lean --environment lean-4.28.0

# With permitted sorries
axle verify-proof statement.lean proof.lean \\
  --permitted-sorries helper1,helper2 \\
  --environment lean-4.28.0

# Pipeline-friendly (proof from stdin)
cat proof.lean | axle verify-proof statement.lean - --environment lean-4.28.0

# Strict mode (non-zero exit on invalid) — perfect for CI
if axle verify-proof statement.lean proof.lean --strict --environment lean-4.28.0 > /dev/null; then
    echo "Proof valid ✅"
else
    echo "Proof invalid ❌"
fi

HTTP

curl -s -X POST <https://axle.axiommath.ai/api/v1/verify_proof> \\
    -H "Authorization: Bearer $AXLE_API_KEY" \\
    -d '{
      "content": "import Mathlib\\ntheorem citation_needed : 1 = 1 := rfl",
      "formal_statement": "import Mathlib\\ntheorem citation_needed : 1 = 1 := by sorry",
      "environment": "lean-4.28.0"
    }' | jq

Output structure

{
  "okay": false,
  "content": "import Mathlib\\n\\ntheorem foo : 1 = 1 := rfl\\n",
  "lean_messages": { "errors": [], "warnings": [], "infos": [] },
  "tool_messages": {
    "errors": ["Theorem 'foo' does not match expected signature: expected type 2 = 2, got 1 = 1"],
    "warnings": [],
    "infos": []
  },
  "failed_declarations": ["foo"],
  "timings": { "total_ms": 160, "formal_statement_ms": 3, "declarations_ms": 0, "candidate_ms": 28 }
}

Common verification error patterns

PatternMeaning
Missing required declaration '{name}'A symbol in formal_statement is missing from content
Kind mismatch for '{name}': candidate has {X} but expected {Y}E.g., theorem vs def
Theorem '{name}' does not match expected signatureTheorem type changed
Unsafe/partial function '{name}' detectedUse of a disallowed function
Axiom '{axiom}' is not in the allowed set of standard axiomsDisallowed axiom used
Declaration '{name}' uses 'sorry' which is not allowedTheorem not actually proven
Candidate uses banned 'open private' commandDisallowed open private

Important security note

verify_proof trusts the Lean environment for speed. A creative adversary could exploit Lean metaprogramming to make invalid proofs appear valid. For untrusted proofs (e.g., from competitors, untrusted AI outputs), additionally run:

  • lean4checker — Lean FRO’s .olean verifier
  • Comparator — Lean FRO’s gold-standard proof judge
  • SafeVerify — battle-tested public proof checker (used by AxiomProver in the Putnam 2025 verification pipeline)

8. Real-World Case Study — AxiomProver on Putnam 2025

Their public Putnam 2025 repo is the cleanest case study of the prover-plus-verification stack working end-to-end. Headline numbers from the GitHub README:

  • Final score: 12/12 (8 solved during the competition window, the remaining 4 in the days after)
  • AxiomProver is described as “an autonomous multi-agent ensemble theorem prover for Lean 4.21.0”
  • Each solution is a fully verified Lean 4 proof, checked by SafeVerify

Sample proof complexity (from their published metrics):

ProblemProver timeTokensProof size
2025 A1110 min7M652 lines, 23 theorems, 561 tactics
2025 A3165 min8M1,333 lines, 78 theorems, 1,701 tactics
2025 B5354 min18M1,495 lines, 66 theorems, 1,967 tactics
2025 B6494 min21M1,019 lines, 30 theorems, 1,052 tactics

Reproducing the verification yourself:

git clone <https://github.com/AxiomMath/Putnam2025>
cd Putnam2025
lake run verify
# Output:
# Verifying A1 solution ... ✅
# Verifying A2 solution ... ✅
# ... through B6

This is the exact pattern Karina described: AxiomProver generates the proof; a verifier independently confirms it; you trust the check mark, not the model.

9. Practical Patterns for Your Own AI-Verification Pipeline

A reasonable architecture if you want to use AXLE in your own agent loop (mirrors AxiomProver’s design):

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   LLM Agent     │ ──▶│  Lean Code Gen   │ ──▶ │  AXLE check    │
│  (Claude/etc.)  │     │  (formal proof)  │     │  (syntax valid?)│
└─────────────────┘     └──────────────────┘     └────────┬────────┘
        ▲                                                 │
        │                                                 ▼
        │                                         ┌──────────────────┐
        │       ┌──────────────────────────┐      │ AXLE verify_proof│
        └───────│  Repair / regenerate on  │◀────│  (semantic ✅?)  │
                │  tool_messages.errors    │      └────────┬─────────┘
                └──────────────────────────┘               │
                                                           ▼
                                                  ┌───────────────────┐
                                                  │  SafeVerify       │
                                                  │  (adversarial ✅) │
                                                  └───────────────────┘

Tips:

  • Use permitted_sorries to let your agent ship partial proofs and iterate (very useful in RL training loops with verifiable rewards — exactly the RLVR pattern Karina described).
  • Use repair_proofs and simplify_theorems as deterministic post-processing to clean up LLM output before verification.
  • Pin a specific environment (lean-4.28.0 or your custom repo-pinned env) for reproducibility — bumping Lean versions mid-experiment will silently break things.
  • Set realistic timeouts: hard ceiling is 900s (15 min) per request. AxiomProver’s longest Putnam proof took ~8 hours of wall-clock prover time, but each individual verification call is much shorter.

Video – Axiom Math at The Montgomery Summit

Related Sections

From Deterministic to Non-Deterministic Workflows — Why Verification Matters Now

The world has shifted from deterministic, predictable workflows to non-deterministic LLM outputs — essentially conjectures. Just as math conjectures need a prover, LLM-generated code, designs, and agent actions need a verifier. Carina illustrated the risk with a recent OpenClaw anecdote: a user had OpenClaw book a Davos speaking slot for him, only to discover he owed $31,000 he could not afford. As autonomous agents are deployed at scale into safety-critical and regulated industries, one agent should certify another agent’s actions before they propagate.

The “Why Now” — Three Movements Converging in 2024

  1. LLMs became genuinely good at informal reasoning — high-level sketching, planning, and outlining (a capability that solidified only after September 2024).
  2. Lean 4 matured into industry-grade infrastructure. The formal-proof language Lean has been developed since 2019 (notably by undergrads at Imperial College London — many of Carina’s Math Olympiad friends, several now full-time at Axiom). Lean 4’s release in September 2023 enabled engineering-scale work, and the broader mathematical community accepted it in 2024.
  3. RLVR (Reinforcement Learning with Verifiable Rewards) delivered breakthrough gains in coding because code provides instant pass/fail feedback. Math is the other natively verifiable domain — and it’s exactly where Axiom plays.

Real-World Applications Already Emerging

  • Hardware/chip verification: GPUs need 100% correctness — there is no partial credit. Verilog/RTL verification is a natural target.
  • CPU hypervisor optimization: AWS recently published 260,000 lines of formal-language code that took world experts 3–5 years to write by hand. Carina notes wryly: “vibe coding has not changed their life.”
  • Beyond SMT/SAT solvers: Traditional formal tools suffer from state-space explosion (a counter value of 100 unrolls 100 times). Lean’s mathematical abstractions and invariants let Axiom Prover bypass exhaustive search — a key technical insight.
  • Historical precedent: The Paris trade union once demanded formal verification of the subway’s automatic switching system; ESA used formal verification on the Ariane project; Boeing and Airbus have applied it for decades.

The Team — A Magnet for Top Talent

  • Fabian Gloeckle — formerly Director at Facebook AI Research, his prior work was deployed across all of Meta for LLM-generated software testing. He champions Donald Knuth’s literate programming vision: programmers reason in natural language while verified code ships directly to deployment.
  • Professor Ken Ono — former Vice Provost of the University of Virginia and former Vice President of the American Mathematical Society, who left academia to join Axiom after watching Axiom Prover solve number-theory conjectures he could not solve himself. For Ken, this represents a new way of doing mathematics — mathematicians operate at higher abstraction (intuitions about proof direction) while Axiom Prover fills in the low-level details in real time.

The Abstraction Ladder

Carina draws a parallel to programming history: punch cards (1950s) → Fortran → low-level compilers → Python → natural language. Each lift in abstraction freed humans for higher-leverage work. AI mathematicians do the same for mathematics: instead of solving 1–2 Millennium Prize problems in a lifetime, a mathematician like Ken could potentially attempt a hundred. The Hardy–Littlewood–Ramanujan analogy lands beautifully here — Lean is the modern proof assistant that converts intuition into rigorous logic.

The Code-with-Proof Insight

A key technical contribution: rather than running RL with two divergent objective functions (Python code + English math proof), Axiom converts math proofs into formal-language form that structurally resembles Python or C — making joint optimization tractable. This is why Axiom Prover’s capability transfers directly from math to code verification, saturating several code-verification benchmarks without modification.

Conclusion and Key Takeaways

Axiom is not building “another LLM that does math.” It is building the verification substrate for the entire generative-AI economy — starting with the most rigorous testbed humanity has (mathematics) and generalizing outward to code, chips, and autonomous agents.

Key takeaways:

  1. Verified AI is the next frontier. As LLMs flood the world with non-deterministic output, the bottleneck shifts from generation to trustworthy verification.
  2. Math is the proving ground, not the product. Anything reducible to code and logic is a candidate market — chip verification, regulated enterprise code, safety-critical agent operations.
  3. The “why now” is a triple convergence: LLMs reasoning informally + Lean 4 industrial maturity + RLVR breakthroughs — all aligned in 2024.
  4. Abstraction lifts unlock human leverage. Mathematicians stop checking nitty-gritty steps and operate on intuition; Axiom Prover handles the formal details.
  5. Trust enables autonomy. Without verification, high-stakes industries cannot deploy AI agents fully. Axiom is building the agent that certifies other agents.
  6. Recruitment is a leading indicator. When a former AMS Vice President leaves a Vice Provost role to join an 8-month-old company, the technical signal is strong.

Related References

Leave a Reply

Your email address will not be published. Required fields are marked *