Name: jaygala24/Qwen3-1.7B-GRPO-math-reasoning API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jaygala24

Overview

This model, jaygala24/Qwen3-1.7B-GRPO-math-reasoning, is a specialized fine-tuned variant of the Qwen3-1.7B base model. Its primary focus is on enhancing mathematical reasoning capabilities through a unique training methodology.

Key Capabilities

Mathematical Reasoning: Specifically optimized for solving mathematical problems, as evidenced by its training on gsm8k_train and math_train datasets.
GRPO Training: Utilizes Group Relative Policy Optimization (GRPO) without a KL penalty, a reinforcement learning technique, to refine its reasoning process.
Strong Benchmark Performance: Achieves notable results on math reasoning benchmarks:
- GSM8K (test): 79.73% pass@1, 95.38% pass@32
- MATH-500: 69.84% pass@1, 94.20% pass@32
- Overall: 77.01% pass@1, 95.05% pass@32 across 1819 problems.

Training Details

The model was trained using PipelineRL with specific hyperparameters including a learning rate of 1e-06, a sequence length of 8192, and bf16 precision, leveraging DeepSpeed ZeRO Stage 3.

When to Use This Model

This model is ideal for use cases requiring robust and accurate mathematical problem-solving, particularly those involving step-by-step reasoning. Its fine-tuning on dedicated math datasets makes it a strong candidate for educational tools, automated problem solvers, or any application where precise numerical and logical deduction is critical.

Overview

Overview

Key Capabilities

Training Details

When to Use This Model

Full Model Card (README)