Name: jaygala24/Qwen2.5-3B-ReMax-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Overview

The jaygala24/Qwen2.5-3B-ReMax-math-reasoning model is a specialized 3.1 billion parameter language model built upon the Qwen2.5-3B architecture. Its primary distinction lies in its fine-tuning process, which leverages the ReMax reinforcement learning algorithm without a KL penalty using the PipelineRL framework. This targeted training aims to significantly enhance its performance in mathematical reasoning.

Key Capabilities & Training

Mathematical Reasoning Focus: The model was specifically trained on mathematical datasets, including gsm8k_train and math_train, to develop strong problem-solving skills.
ReMax Algorithm: Utilizes the ReMax algorithm with a greedy-decoded response's reward as the baseline for advantages, a key aspect of its reinforcement learning approach.
Performance Benchmarks: Achieves notable pass@k scores on standard mathematical reasoning benchmarks:
- GSM8K (test): 85.99% pass@1, 97.50% pass@32
- MATH-500: 67.36% pass@1, 91.20% pass@32
- Overall: 80.87% pass@1, 95.77% pass@32 (weighted by problem count).
Training Details: Trained with a sequence length of 8192, a learning rate of 1e-06, and utilizing DeepSpeed ZeRO Stage 3 for efficiency.

When to Use This Model

This model is particularly well-suited for applications requiring accurate and robust mathematical problem-solving. Developers should consider jaygala24/Qwen2.5-3B-ReMax-math-reasoning for tasks such as:

Automated math problem solvers.
Educational tools that require step-by-step mathematical reasoning.
Any application where precise numerical and logical deduction is critical.

Overview

Overview

Key Capabilities & Training

When to Use This Model

Full Model Card (README)