Name: jaygala24/Qwen2.5-0.5B-ReMax-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

This model, jaygala24/Qwen2.5-0.5B-ReMax-math-reasoning, is a 0.5 billion parameter language model fine-tuned from the Qwen2.5-0.5B base model. Its core differentiator is the application of the ReMax reinforcement learning algorithm (without KL penalty), specifically tailored to enhance mathematical reasoning capabilities. The training utilized the PipelineRL framework.

Key Capabilities

Enhanced Mathematical Reasoning: Fine-tuned on gsm8k_train and math_train datasets to improve problem-solving in mathematics.
ReMax Algorithm: Leverages ReMax with a greedy-decoded response's reward as the advantage baseline, focusing on direct reward maximization.
Performance on Math Benchmarks: Achieves an overall pass@1 score of 47.71% and pass@32 of 86.42% across GSM8K and MATH-500 datasets, indicating strong performance for its size.

Good For

Mathematical Problem Solving: Ideal for applications requiring accurate, step-by-step solutions to arithmetic and algebraic problems.
Research in RL for Reasoning: Provides a practical example of ReMax application for improving specific reasoning skills in LLMs.
Resource-Constrained Environments: As a 0.5B parameter model, it offers a compact solution for mathematical reasoning where larger models might be impractical.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)