Name: kong3125/Qwen2.5-MATH-1.5B-BASE-RLOO-EP3-LR2e06 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: kong3125

Model Overview

This model, kong3125/Qwen2.5-MATH-1.5B-BASE-RLOO-EP3-LR2e06, is a specialized language model derived from Qwen's Qwen2.5-MATH-7B. It has undergone fine-tuning to significantly enhance its capabilities in mathematical reasoning.

Key Differentiators

Mathematical Reasoning Focus: Specifically fine-tuned on the jhn9803/hendrycks-math-with-answers dataset, making it highly proficient in solving mathematical problems.
GRPO Training Method: Utilizes the GRPO (Generalized Reinforcement Learning with Policy Optimization) training procedure, a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models", to optimize its mathematical problem-solving skills.
TRL Framework: Training was conducted using the TRL library, indicating a reinforcement learning approach to fine-tuning.

Use Cases

This model is particularly well-suited for applications requiring strong mathematical reasoning, such as:

Automated problem-solving in mathematics.
Educational tools for math assistance.
Research in AI for mathematical understanding and generation.

Training Details

The model was trained with specific versions of key frameworks:

TRL: 0.18.0
Transformers: 4.52.3
Pytorch: 2.6.0
Datasets: 2.17.0
Tokenizers: 0.21.4

Overview

Model Overview

Key Differentiators

Use Cases

Training Details

Full Model Card (README)