divelab/DAPO_E2H-math-cosine
The divelab/DAPO_E2H-math-cosine model is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by divelab, it specializes in mathematical reasoning tasks, having been trained on the MATH dataset using the E2H method. This model is optimized for solving complex mathematical problems, leveraging techniques like GRPO for enhanced performance in mathematical reasoning.
Loading preview...
Overview
The divelab/DAPO_E2H-math-cosine model is a specialized 1.5 billion parameter language model derived from the Qwen/Qwen2.5-1.5B-Instruct architecture. It has been meticulously fine-tuned on the MATH dataset using the E2H (Easy to Hard Reasoning) training framework, built on top of Hugging Face's TRL library. This model is specifically engineered to excel in mathematical reasoning tasks.
Key Capabilities
- Enhanced Mathematical Reasoning: Optimized for solving complex mathematical problems through fine-tuning on the MATH dataset.
- GRPO Integration: Incorporates the GRPO method, as introduced in the DeepSeekMath paper, to push the limits of mathematical reasoning.
- Instruction Following: Retains instruction-following capabilities from its base Qwen2.5-1.5B-Instruct model.
Good For
- Applications requiring robust mathematical problem-solving.
- Research and development in mathematical reasoning with LLMs.
- Tasks benefiting from models trained with curriculum reinforcement learning (E2H).