divelab/DAPO_E2H-countdown-gaussian_0p5_0p5

TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Apr 19, 2026Architecture:Transformer Cold

The divelab/DAPO_E2H-countdown-gaussian_0p5_0p5 model is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. Developed by divelab, it leverages the E2H framework and the GRPO method for mathematical reasoning. This model is specifically optimized for tasks requiring mathematical reasoning, having been trained on the Countdown dataset.

Loading preview...

Overview

This model, divelab/DAPO_E2H-countdown-gaussian_0p5_0p5, is a specialized 1.5 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, developed by divelab.

Key Capabilities

  • Mathematical Reasoning: The model is specifically fine-tuned on the Countdown dataset, enhancing its capabilities in mathematical problem-solving and reasoning tasks.
  • E2H Framework: Training utilized the E2H (Easy to Hard Reasoning) framework, which is designed to improve LLM reasoning through curriculum reinforcement learning.
  • GRPO Method: It incorporates the GRPO (Gaussian Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, further optimizing its mathematical reasoning performance.

Good For

  • Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning, such as solving arithmetic problems, logical puzzles, or tasks from the Countdown dataset.
  • Research in Reasoning: Useful for researchers exploring advanced reasoning techniques in language models, particularly those interested in curriculum learning and reinforcement learning from human feedback (RLHF) applied to mathematical domains.