Name: jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning Overview

This model is a specialized 1.5 billion parameter variant of the Qwen2.5-1.5B architecture, fine-tuned by jaygala24 specifically for mathematical reasoning tasks. It utilizes the ReMax reinforcement learning algorithm without a KL penalty, a method designed to enhance performance in specific domains by optimizing directly for reward signals.

Key Capabilities & Training

Mathematical Reasoning Focus: The model was trained on a combination of gsm8k_train and math_train datasets, making it highly proficient in solving arithmetic and algebraic problems.
Reinforcement Learning: Employs the ReMax algorithm with a greedy-decoded response reward as the advantage baseline, and a policy loss based on PPO, with a KL coefficient of 0.0.
Performance: Achieves notable pass@k scores on mathematical benchmarks, including 76.71% pass@1 on GSM8K (test) and 57.79% pass@1 on MATH-500, with overall pass@32 reaching 94.34% across 1819 problems.
Context Length: Supports a substantial context length of 8192 tokens during training, allowing for processing longer problem descriptions and reasoning steps.

Use Cases

Automated Math Problem Solving: Ideal for applications requiring accurate step-by-step mathematical reasoning and final answer derivation.
Educational Tools: Can be integrated into platforms for generating solutions or explanations for math problems.
Research in RL for Reasoning: Serves as a strong baseline or component for further research into reinforcement learning applications for complex reasoning tasks.

Overview

jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning Overview

Key Capabilities & Training

Use Cases

Full Model Card (README)