Name: jaygala24/Qwen3-1.7B-GRPO-KL-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Model Overview

This model, jaygala24/Qwen3-1.7B-GRPO-KL-math-reasoning, is a specialized fine-tuned version of the Qwen3-1.7B base model. Its primary distinction lies in its training methodology, employing Group Relative Policy Optimization (GRPO) with a KL penalty via the PipelineRL framework, specifically targeting mathematical reasoning tasks.

Key Capabilities & Training

Mathematical Reasoning: Optimized for solving complex math problems, as evidenced by its strong performance on benchmarks.
Reinforcement Learning Fine-tuning: Leverages GRPO with a KL coefficient of 0.001 and a PPO policy loss, enhancing its ability to generate correct step-by-step reasoning.
Dataset Focus: Trained on gsm8k_train and math_train datasets, with evaluation on gsm8k_test and math_500.
Performance: Achieves notable pass@k scores, including an 80.07% pass@1 on GSM8K (test) and 69.64% pass@1 on MATH-500, with overall pass@32 reaching 95.16% across both datasets.
Technical Stack: Built using PipelineRL, Transformers, and DeepSpeed (ZeRO Stage 3) for efficient training.

Use Cases

This model is particularly well-suited for applications requiring accurate and detailed mathematical problem-solving. Developers should consider this model for:

Automated Math Tutors: Generating step-by-step solutions for arithmetic and algebraic problems.
Quantitative Analysis: Assisting in tasks that demand precise numerical reasoning.
Educational Tools: Providing explanations and answers to mathematical queries.

Overview

Model Overview

Key Capabilities & Training

Use Cases

Full Model Card (README)