Name: wzx111/Qwen3-1.7B-MATH-GDPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wzx111

Overview

wzx111/Qwen3-1.7B-MATH-GDPO is a 1.7 billion parameter language model, fine-tuned from the base Qwen/Qwen3-1.7B model. Its primary focus is on mathematical reasoning, achieved through specialized training.

Key Capabilities

Mathematical Reasoning: The model has been fine-tuned on the watermelonhjg/MATH-lighteval-level_2 dataset, making it proficient in solving mathematical problems.
GRPO Training Method: It utilizes the GRPO (Gradient Regularized Policy Optimization) method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), to enhance its mathematical capabilities.
TRL Framework: The training was conducted using the TRL (Transformer Reinforcement Learning) library.

Use Cases

This model is particularly well-suited for applications requiring strong mathematical problem-solving abilities. Developers can integrate it into systems that need to process and generate responses for complex mathematical queries or educational tools focused on math.

Overview

Overview

Key Capabilities

Use Cases

Full Model Card (README)