divelab/DAPO_E2H-gsm8k-gaussian_0p25_0p75

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 19, 2026Architecture:Transformer Cold

The divelab/DAPO_E2H-gsm8k-gaussian_0p25_0p75 is a 1.5 billion parameter instruction-tuned language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by divelab, this model specializes in mathematical reasoning tasks, particularly on the gsm8k dataset. It leverages the E2H training method on top of TRL, incorporating GRPO for enhanced mathematical problem-solving capabilities.

Loading preview...

Overview

This model, divelab/DAPO_E2H-gsm8k-gaussian_0p25_0p75, is a specialized 1.5 billion parameter language model derived from Qwen/Qwen2.5-1.5B-Instruct. It has been meticulously fine-tuned on the gsm8k-dataset to excel in mathematical reasoning tasks.

Key Training Details

The model's training procedure incorporates the E2H method, built upon the TRL framework. A significant aspect of its training is the use of GRPO (Gradient Regularized Policy Optimization), a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (paper link). This approach aims to enhance the model's ability to tackle complex mathematical problems.

Intended Use

This model is primarily designed for applications requiring robust mathematical reasoning, particularly those involving arithmetic and word problems similar to the GSM8K benchmark. Its fine-tuning process makes it a strong candidate for tasks where accurate numerical and logical deduction is crucial.