Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_1p0_0p0_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, developed by Kazuki1450, is a specialized fine-tuned variant of the Qwen/Qwen3-1.7B-Base language model. It leverages the TRL (Transformers Reinforcement Learning) framework for its training process.

Key Differentiator: GRPO Method

The primary distinction of this model lies in its application of the GRPO (Gradient-based Reward Policy Optimization) method during training. GRPO is a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a strong focus on improving the model's proficiency in complex mathematical reasoning tasks.

Training Details

Base Model: Qwen/Qwen3-1.7B-Base
Training Framework: TRL (version 0.29.0)
Optimization Method: GRPO, as detailed in the DeepSeekMath paper.
Framework Versions:
- TRL: 0.29.0
- Transformers: 4.57.3
- Pytorch: 2.9.0
- Datasets: 4.0.0
- Tokenizers: 0.22.1

Potential Use Cases

Given its training with the GRPO method, this model is likely well-suited for applications requiring:

Mathematical problem-solving
Logical reasoning in quantitative contexts
Tasks that benefit from enhanced numerical understanding

Overview

Model Overview

Key Differentiator: GRPO Method

Training Details

Potential Use Cases

Full Model Card (README)