PlanePaper/LEAD-7B
PlanePaper/LEAD-7B is a 7.6 billion parameter language model fine-tuned using the GRPO-LEAD reinforcement learning pipeline, specifically optimized for efficient and accurate mathematical reasoning. This model achieves superior consistency and accuracy on AIME24 and AIME25 datasets, demonstrating significantly shorter average reasoning lengths compared to other 14B models. With a context length of 131072 tokens, it is designed for challenging mathematical problem-solving scenarios.
Loading preview...
GRPO-LEAD: Efficient Mathematical Reasoning
PlanePaper/LEAD-7B is a 7.6 billion parameter model developed using the GRPO-LEAD (GRPO with Length-dependent rewards, Explicit penalties, and Advantage reweighting for Difficulty) reinforcement learning pipeline. This advanced fine-tuning approach focuses on enhancing LLMs for concise, accurate, and efficient reasoning in complex mathematical tasks.
Key Capabilities & Performance
- Superior Mathematical Reasoning: Achieves higher consistency and accuracy on challenging AIME24 and AIME25 datasets.
- Efficiency: Demonstrates significantly shorter average reasoning lengths compared to larger 14B models like DeepSeek-Distilled-14B and Light-R1-14B-DS, indicating more efficient problem-solving.
- Optimized for Difficulty: Trained on a curated dataset, GRPO-LEAD-SFTData, which includes 12,153 high-quality mathematical reasoning samples with a focus on problems with difficulty > 1.
When to Use This Model
- Mathematical Problem Solving: Ideal for applications requiring precise and efficient step-by-step mathematical reasoning.
- Concise Explanations: Suited for scenarios where not only the correct answer but also a streamlined, shorter reasoning path is desired.
- Research in RLHF for Reasoning: Provides a strong baseline and methodology for further exploration in reinforcement learning for mathematical tasks.
For detailed implementation and further exploration, refer to the GitHub Repository and the associated GRPO-LEAD-SFTData dataset.