Name: jhn9803/Qwen2.5-MATH-1.5B-Instruct-DAPO-G8 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jhn9803

Model Overview

The jhn9803/Qwen2.5-MATH-1.5B-Instruct-DAPO-G8 is a 1.5 billion parameter instruction-tuned model built upon the Qwen2.5-Math-1.5B-Instruct base. It has been specialized for mathematical reasoning through fine-tuning on the jhn9803/hendrycks-math-with-answers dataset.

Key Capabilities

Mathematical Reasoning: Optimized for solving mathematical problems, leveraging a dataset specifically curated for this purpose.
GRPO Training: Incorporates the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its mathematical problem-solving abilities.
Instruction Following: Designed to follow instructions effectively, making it suitable for interactive mathematical tasks.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework. The GRPO method, which is central to its mathematical performance, is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Good For

Mathematical Problem Solving: Ideal for applications requiring the model to understand and solve various mathematical challenges.
Research in Mathematical LLMs: Provides a base for further experimentation and development in mathematical reasoning with language models.

Overview

Model Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)