Name: PlanePaper/LEAD-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: PlanePaper

GRPO-LEAD: Efficient Mathematical Reasoning

PlanePaper/LEAD-7B is a 7.6 billion parameter model developed using the GRPO-LEAD (GRPO with Length-dependent rewards, Explicit penalties, and Advantage reweighting for Difficulty) reinforcement learning pipeline. This advanced fine-tuning approach focuses on enhancing LLMs for concise, accurate, and efficient reasoning in complex mathematical tasks.

Key Capabilities & Performance

Superior Mathematical Reasoning: Achieves higher consistency and accuracy on challenging AIME24 and AIME25 datasets.
Efficiency: Demonstrates significantly shorter average reasoning lengths compared to larger 14B models like DeepSeek-Distilled-14B and Light-R1-14B-DS, indicating more efficient problem-solving.
Optimized for Difficulty: Trained on a curated dataset, GRPO-LEAD-SFTData, which includes 12,153 high-quality mathematical reasoning samples with a focus on problems with difficulty > 1.

When to Use This Model

Mathematical Problem Solving: Ideal for applications requiring precise and efficient step-by-step mathematical reasoning.
Concise Explanations: Suited for scenarios where not only the correct answer but also a streamlined, shorter reasoning path is desired.
Research in RLHF for Reasoning: Provides a strong baseline and methodology for further exploration in reinforcement learning for mathematical tasks.

For detailed implementation and further exploration, refer to the GitHub Repository and the associated GRPO-LEAD-SFTData dataset.

Overview

GRPO-LEAD: Efficient Mathematical Reasoning

Key Capabilities & Performance

When to Use This Model

Full Model Card (README)