jaygala24/Qwen3-1.7B-ReMax-math-reasoning
jaygala24/Qwen3-1.7B-ReMax-math-reasoning is a fine-tuned version of the Qwen3-1.7B model, specifically optimized for mathematical reasoning tasks. This model leverages the ReMax reinforcement learning algorithm without a KL penalty, trained on a combination of GSM8K and MATH datasets. It demonstrates strong performance on math reasoning benchmarks, achieving an overall pass@1 of 78.32% and pass@32 of 94.72% across GSM8K and MATH-500 datasets. The model is designed for applications requiring accurate step-by-step mathematical problem-solving.
Loading preview...
Model Overview
This model, jaygala24/Qwen3-1.7B-ReMax-math-reasoning, is a specialized fine-tune of the Qwen3-1.7B base model. Its primary focus is enhancing mathematical reasoning capabilities through a unique reinforcement learning approach.
Key Capabilities & Training
- Mathematical Reasoning: Specifically fine-tuned to excel at solving mathematical problems, as evidenced by its strong performance on benchmarks like GSM8K and MATH-500.
- ReMax RL Algorithm: Utilizes the ReMax algorithm for training, notably without a KL penalty, which differentiates its optimization strategy. This method employs a greedy-decoded response's reward as the baseline for advantages.
- Targeted Datasets: Trained on
gsm8k_trainandmath_traindatasets, ensuring a strong foundation in arithmetic and advanced mathematical concepts. - Performance Metrics: Achieves an impressive overall pass@1 of 78.32% and pass@32 of 94.72% across 1819 problems from GSM8K and MATH-500 datasets, indicating robust problem-solving ability.
- High Sequence Length: Trained with a sequence length of 8192, allowing for processing longer problem descriptions and reasoning steps.
Should I use this for my use case?
- Yes, if you need: A compact yet powerful model for mathematical problem-solving, arithmetic, and logical reasoning tasks where step-by-step explanations are crucial.
- Yes, if you are: Developing applications that require high accuracy in quantitative analysis or educational tools for math.
- Consider alternatives if: Your primary use case is general-purpose text generation, creative writing, or tasks unrelated to mathematical reasoning, as this model is highly specialized.