divelab/DAPO_E2H-math-gaussian_0p5_0p5
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 19, 2026Architecture:Transformer Cold
The divelab/DAPO_E2H-math-gaussian_0p5_0p5 model is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen2.5-1.5B-Instruct. Developed by divelab, it specializes in mathematical reasoning tasks, leveraging the E2H training framework and the GRPO method. This model is optimized for enhancing mathematical problem-solving capabilities within a 32768-token context window.
Loading preview...
Overview
This model, divelab/DAPO_E2H-math-gaussian_0p5_0p5, is a specialized 1.5 billion parameter language model derived from Qwen2.5-1.5B-Instruct. It has been meticulously fine-tuned on the MATH dataset to significantly enhance its mathematical reasoning abilities.
Key Capabilities
- Advanced Mathematical Reasoning: Specifically trained to excel in complex mathematical problem-solving.
- E2H Training Framework: Utilizes the E2H framework, which employs Curriculum Reinforcement Learning from Easy to Hard tasks to improve LLM reasoning.
- GRPO Method Integration: Incorporates the GRPO (Generalized Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper, for robust training.
Good for
- Mathematical Problem Solving: Ideal for applications requiring precise and logical mathematical reasoning.
- Research in LLM Training: Useful for researchers exploring advanced reinforcement learning techniques like E2H and GRPO for domain-specific model optimization.
- Educational Tools: Can be integrated into tools designed to assist with or generate solutions for mathematical challenges.