Name: shawntzx/Qwen2.5-3B-GRPO-3_13_math API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: shawntzx

Overview

shawntzx/Qwen2.5-3B-GRPO-3_13_math is a 3.1 billion parameter language model, fine-tuned from the base Qwen/Qwen2.5-3B-Instruct model. This model leverages the Gradient-based Reward Policy Optimization (GRPO) method, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." The fine-tuning process was conducted using the TRL framework.

Key Capabilities

Enhanced Mathematical Reasoning: Specifically optimized for handling complex mathematical problems and logical deductions.
GRPO Training: Benefits from a training methodology designed to improve performance in mathematical contexts.
Instruction-Following: Inherits strong instruction-following capabilities from its base Qwen2.5-3B-Instruct model.

Good For

Applications requiring robust mathematical problem-solving.
Tasks involving symbolic reasoning and numerical analysis.
Research and development in advanced AI for mathematics.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)