Overview
Goedel-Prover-V2-32B: State-of-the-Art Formal Theorem Proving
Goedel-Prover-V2-32B is a 32 billion parameter open-source language model developed by Goedel-LM, setting a new benchmark in automated formal proof generation. It is built upon an expert iteration and reinforcement learning pipeline, incorporating several key innovations:
Key Innovations & Capabilities
- Scaffolded Data Synthesis: Progressively trains the model on synthetic proof tasks of increasing difficulty.
- Verifier-Guided Self-Correction: The model refines its proofs using feedback from Lean's compiler, mimicking human iterative refinement. This mode significantly boosts performance, with the 32B model solving 64 problems on PutnamBench at Pass@64, surpassing DeepSeek-Prover-V2-671B's record of 47 problems.
- Model Averaging: Combines multiple model checkpoints for enhanced robustness and overall performance.
Benchmark Performance
Goedel-Prover-V2-32B demonstrates compelling performance across various benchmarks:
- Achieves 88.0% on MiniF2F at Pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA DeepSeek-Prover-V2-671B and Kimina-Prover-72B.
- The smaller Goedel-Prover-V2-8B model matches DeepSeek-Prover-V2-671B's performance on MiniF2F while being nearly 100 times smaller.
- Secures the top rank on PutnamBench, solving 57 problems at Pass@32 in self-correction mode, significantly outperforming competitors with less computational effort.
New Dataset Release
Goedel-LM has also released MathOlympiadBench, a new benchmark comprising 360 human-verified formalizations of Olympiad-level mathematical competition problems, including IMO problems.
Ideal Use Cases
- Automated Theorem Proving: Generating formal proofs for complex mathematical statements.
- Mathematical Research: Assisting researchers in formalizing and verifying mathematical conjectures.
- Educational Tools: Aiding students and educators in understanding and constructing formal proofs.
- Competitive Programming: Solving challenging mathematical competition problems, particularly those requiring formal verification.