Name: divelab/DAPO_E2H-gsm8k-gaussian_0p25_0p75 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: divelab

Overview

This model, divelab/DAPO_E2H-gsm8k-gaussian_0p25_0p75, is a specialized 1.5 billion parameter language model derived from Qwen/Qwen2.5-1.5B-Instruct. It has been meticulously fine-tuned on the gsm8k-dataset to excel in mathematical reasoning tasks.

Key Training Details

The model's training procedure incorporates the E2H method, built upon the TRL framework. A significant aspect of its training is the use of GRPO (Gradient Regularized Policy Optimization), a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (paper link). This approach aims to enhance the model's ability to tackle complex mathematical problems.

Intended Use

This model is primarily designed for applications requiring robust mathematical reasoning, particularly those involving arithmetic and word problems similar to the GSM8K benchmark. Its fine-tuning process makes it a strong candidate for tasks where accurate numerical and logical deduction is crucial.

Overview

Overview

Key Training Details

Intended Use

Full Model Card (README)