Goedel-LM/Goedel-Prover-V2-32B

Warm
Public
32B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Goedel-Prover-V2-32B: State-of-the-Art Formal Theorem Proving

Goedel-Prover-V2-32B is a 32 billion parameter open-source language model developed by Goedel-LM, setting a new benchmark in automated formal proof generation. It is built upon an expert iteration and reinforcement learning pipeline, incorporating several key innovations:

Key Innovations & Capabilities

  • Scaffolded Data Synthesis: Progressively trains the model on synthetic proof tasks of increasing difficulty.
  • Verifier-Guided Self-Correction: The model refines its proofs using feedback from Lean's compiler, mimicking human iterative refinement. This mode significantly boosts performance, with the 32B model solving 64 problems on PutnamBench at Pass@64, surpassing DeepSeek-Prover-V2-671B's record of 47 problems.
  • Model Averaging: Combines multiple model checkpoints for enhanced robustness and overall performance.

Benchmark Performance

Goedel-Prover-V2-32B demonstrates compelling performance across various benchmarks:

  • Achieves 88.0% on MiniF2F at Pass@32 in standard mode and 90.4% in self-correction mode, outperforming prior SOTA DeepSeek-Prover-V2-671B and Kimina-Prover-72B.
  • The smaller Goedel-Prover-V2-8B model matches DeepSeek-Prover-V2-671B's performance on MiniF2F while being nearly 100 times smaller.
  • Secures the top rank on PutnamBench, solving 57 problems at Pass@32 in self-correction mode, significantly outperforming competitors with less computational effort.

New Dataset Release

Goedel-LM has also released MathOlympiadBench, a new benchmark comprising 360 human-verified formalizations of Olympiad-level mathematical competition problems, including IMO problems.

Ideal Use Cases

  • Automated Theorem Proving: Generating formal proofs for complex mathematical statements.
  • Mathematical Research: Assisting researchers in formalizing and verifying mathematical conjectures.
  • Educational Tools: Aiding students and educators in understanding and constructing formal proofs.
  • Competitive Programming: Solving challenging mathematical competition problems, particularly those requiring formal verification.