Name: Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p8_1p0_grpo_42_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p8_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a 32K context length. It was developed by Kazuki1450 and trained using the TRL framework.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It leverages GRPO (Gradient Regularized Policy Optimization), a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is specifically designed to enhance a model's capabilities in mathematical reasoning and complex problem-solving.

Capabilities

Enhanced Mathematical Reasoning: Benefits from GRPO training, suggesting improved performance on tasks requiring logical and mathematical understanding.
Base Model Strengths: Inherits the foundational capabilities of the Qwen3-1.7B-Base model.
Text Generation: Capable of generating coherent and contextually relevant text, as demonstrated by the quick start example.

When to Use This Model

This model is particularly suitable for applications where:

You need a compact model (2B parameters) with a focus on improved reasoning, especially in mathematical contexts.
You are working with tasks that could benefit from the GRPO training approach for better logical consistency.
You are looking for a fine-tuned Qwen3 variant with specific enhancements for problem-solving.

Overview

Model Overview

Key Differentiator: GRPO Training

Capabilities

When to Use This Model

Full Model Card (README)