Name: Goedel-LM/Goedel-Prover-V2-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: Goedel-LM

Goedel-Prover-V2-32B: State-of-the-Art Formal Theorem Proving

Goedel-Prover-V2-32B is a 32 billion parameter open-source language model developed by Goedel-LM, setting a new benchmark in automated formal proof generation. It is built upon an expert iteration and reinforcement learning pipeline, incorporating several key innovations:

Key Innovations & Capabilities

Scaffolded Data Synthesis: Progressively trains the model on synthetic proof tasks of increasing difficulty.
Verifier-Guided Self-Correction: The model refines its proofs using feedback from Lean's compiler, mimicking human iterative refinement. This mode significantly boosts performance, with the 32B model solving 64 problems on PutnamBench at Pass@64, surpassing DeepSeek-Prover-V2-671B's record of 47 problems.
Model Averaging: Combines multiple model checkpoints for enhanced robustness and overall performance.

Benchmark Performance

Goedel-Prover-V2-32B demonstrates compelling performance across various benchmarks:

Achieves 88.0% on MiniF2F at Pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA DeepSeek-Prover-V2-671B and Kimina-Prover-72B.
The smaller Goedel-Prover-V2-8B model matches DeepSeek-Prover-V2-671B's performance on MiniF2F while being nearly 100 times smaller.
Secures the top rank on PutnamBench, solving 57 problems at Pass@32 in self-correction mode, significantly outperforming competitors with less computational effort.

New Dataset Release

Goedel-LM has also released MathOlympiadBench, a new benchmark comprising 360 human-verified formalizations of Olympiad-level mathematical competition problems, including IMO problems.

Ideal Use Cases

Automated Theorem Proving: Generating formal proofs for complex mathematical statements.
Mathematical Research: Assisting researchers in formalizing and verifying mathematical conjectures.
Educational Tools: Aiding students and educators in understanding and constructing formal proofs.
Competitive Programming: Solving challenging mathematical competition problems, particularly those requiring formal verification.