Name: Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p0_1p0_grpo_dr_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p0_1p0_grpo_dr_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a 32768-token context length. It was developed by Kazuki1450 and trained using the TRL (Transformers Reinforcement Learning) framework.

Key Differentiator: GRPO Training

A core aspect of this model is its training methodology, which incorporates GRPO (Gradient-based Reward Policy Optimization). This technique, originally introduced in the DeepSeekMath paper, is designed to significantly improve a model's mathematical reasoning abilities. This makes the model particularly adept at handling complex numerical and logical problems.

Use Cases

Given its specialized training with GRPO, this model is particularly well-suited for:

Mathematical problem-solving: Tasks requiring precise calculations and logical deduction.
Scientific computing: Applications involving quantitative analysis and data interpretation.
Reasoning-intensive tasks: Scenarios where robust logical inference is critical.

Training Details

The model was fine-tuned using TRL, with specific framework versions including TRL 0.29.0, Transformers 4.57.6, and Pytorch 2.9.0. The training process is publicly visualized via Weights & Biases, indicating a structured and monitored development approach.

Overview

Overview

Key Differentiator: GRPO Training

Use Cases

Training Details

Full Model Card (README)