Name: divelab/DAPO_E2H-countdown-gaussian_0p5_0p5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: divelab

Overview

This model, divelab/DAPO_E2H-countdown-gaussian_0p5_0p5, is a specialized 1.5 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Qwen/Qwen2.5-1.5B-Instruct base model, developed by divelab.

Key Capabilities

Mathematical Reasoning: The model is specifically fine-tuned on the Countdown dataset, enhancing its capabilities in mathematical problem-solving and reasoning tasks.
E2H Framework: Training utilized the E2H (Easy to Hard Reasoning) framework, which is designed to improve LLM reasoning through curriculum reinforcement learning.
GRPO Method: It incorporates the GRPO (Gaussian Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, further optimizing its mathematical reasoning performance.

Good For

Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning, such as solving arithmetic problems, logical puzzles, or tasks from the Countdown dataset.
Research in Reasoning: Useful for researchers exploring advanced reasoning techniques in language models, particularly those interested in curriculum learning and reinforcement learning from human feedback (RLHF) applied to mathematical domains.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)