Name: Kudod/NuminaMath-Qwen2.5-1.5B-GRPO-test-v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kudod

Model Overview

Kudod/NuminaMath-Qwen2.5-1.5B-GRPO-test-v1 is a specialized language model, fine-tuned from the Qwen/Qwen2.5-1.5B-Instruct base model. Its primary distinction lies in its training methodology, which incorporates GRPO (Gradient-based Reward Policy Optimization). This technique, detailed in the DeepSeekMath paper, is designed to significantly improve a model's proficiency in mathematical reasoning.

Key Capabilities

Enhanced Mathematical Reasoning: The model's training with GRPO specifically targets and improves its ability to understand and solve complex mathematical problems.
Qwen2.5 Architecture: Built upon the Qwen2.5-1.5B-Instruct foundation, it inherits the general language understanding and generation capabilities of the Qwen family.
Extended Context Length: Features a substantial context window of 131072 tokens, allowing it to process and reason over lengthy problem descriptions or complex mathematical proofs.

Training Details

The model was fine-tuned using the TRL framework (Transformer Reinforcement Learning), leveraging specific versions of libraries including TRL 0.25.1, Transformers 4.57.1, and Pytorch 2.9.1. The GRPO method, central to its mathematical performance, is a key innovation from the DeepSeekMath research.

Ideal Use Cases

This model is particularly well-suited for applications requiring robust mathematical problem-solving, logical deduction, and handling extensive textual context in technical or scientific domains. Its GRPO-enhanced training makes it a strong candidate for tasks where precise mathematical understanding is critical.

Overview

Model Overview

Key Capabilities

Training Details

Ideal Use Cases

Full Model Card (README)