divelab/DAPO_E2H-gsm8k-gaussian_0p25_0p75
The divelab/DAPO_E2H-gsm8k-gaussian_0p25_0p75 is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by divelab, this model specializes in mathematical reasoning tasks, particularly on the gsm8k dataset. It leverages the E2H training method on top of TRL, incorporating GRPO for enhanced mathematical problem-solving capabilities.
Loading preview...
Overview
This model, divelab/DAPO_E2H-gsm8k-gaussian_0p25_0p75, is a specialized 1.5 billion parameter language model derived from Qwen/Qwen2.5-1.5B-Instruct. It has been meticulously fine-tuned on the gsm8k-dataset to excel in mathematical reasoning tasks.
Key Training Details
The model's training procedure incorporates the E2H method, built upon the TRL framework. A significant aspect of its training is the use of GRPO (Gradient Regularized Policy Optimization), a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (paper link). This approach aims to enhance the model's ability to tackle complex mathematical problems.
Intended Use
This model is primarily designed for applications requiring robust mathematical reasoning, particularly those involving arithmetic and word problems similar to the GSM8K benchmark. Its fine-tuning process makes it a strong candidate for tasks where accurate numerical and logical deduction is crucial.