Goedel-Prover-V2-32B by Goedel-LM is a 32 billion parameter open-source language model series designed for automated formal proof generation, achieving state-of-the-art performance in theorem proving. It utilizes scaffolded data synthesis, verifier-guided self-correction, and model averaging to master complex mathematical theorems. This model excels at formal mathematics tasks, outperforming larger models on benchmarks like MiniF2F and PutnamBench, making it ideal for advanced mathematical reasoning and proof assistance.
Loading preview...
Goedel-Prover-V2-32B: State-of-the-Art Formal Theorem Proving
Goedel-Prover-V2-32B is a 32 billion parameter open-source language model developed by Goedel-LM, setting a new benchmark in automated formal proof generation. It is built upon an expert iteration and reinforcement learning pipeline, incorporating several key innovations:
Key Innovations & Capabilities
- Scaffolded Data Synthesis: Progressively trains the model on synthetic proof tasks of increasing difficulty.
- Verifier-Guided Self-Correction: The model refines its proofs using feedback from Lean's compiler, mimicking human iterative refinement. This mode significantly boosts performance, with the 32B model solving 64 problems on PutnamBench at Pass@64, surpassing DeepSeek-Prover-V2-671B's record of 47 problems.
- Model Averaging: Combines multiple model checkpoints for enhanced robustness and overall performance.
Benchmark Performance
Goedel-Prover-V2-32B demonstrates compelling performance across various benchmarks:
- Achieves 88.0% on MiniF2F at Pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA DeepSeek-Prover-V2-671B and Kimina-Prover-72B.
- The smaller Goedel-Prover-V2-8B model matches DeepSeek-Prover-V2-671B's performance on MiniF2F while being nearly 100 times smaller.
- Secures the top rank on PutnamBench, solving 57 problems at Pass@32 in self-correction mode, significantly outperforming competitors with less computational effort.
New Dataset Release
Goedel-LM has also released MathOlympiadBench, a new benchmark comprising 360 human-verified formalizations of Olympiad-level mathematical competition problems, including IMO problems.
Ideal Use Cases
- Automated Theorem Proving: Generating formal proofs for complex mathematical statements.
- Mathematical Research: Assisting researchers in formalizing and verifying mathematical conjectures.
- Educational Tools: Aiding students and educators in understanding and constructing formal proofs.
- Competitive Programming: Solving challenging mathematical competition problems, particularly those requiring formal verification.