Name: divelab/DAPO_E2H-math-gaussian_0p5_0p5 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: divelab

Overview

This model, divelab/DAPO_E2H-math-gaussian_0p5_0p5, is a specialized 1.5 billion parameter language model derived from Qwen2.5-1.5B-Instruct. It has been meticulously fine-tuned on the MATH dataset to significantly enhance its mathematical reasoning abilities.

Key Capabilities

Advanced Mathematical Reasoning: Specifically trained to excel in complex mathematical problem-solving.
E2H Training Framework: Utilizes the E2H framework, which employs Curriculum Reinforcement Learning from Easy to Hard tasks to improve LLM reasoning.
GRPO Method Integration: Incorporates the GRPO (Generalized Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper, for robust training.

Good for

Mathematical Problem Solving: Ideal for applications requiring precise and logical mathematical reasoning.
Research in LLM Training: Useful for researchers exploring advanced reinforcement learning techniques like E2H and GRPO for domain-specific model optimization.
Educational Tools: Can be integrated into tools designed to assist with or generate solutions for mathematical challenges.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)