Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_1_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_1_rule, is a fine-tuned version of the Qwen/Qwen3-1.7B-Base architecture, featuring 2 billion parameters and a 40960-token context length. It was developed by Kazuki1450 using the TRL framework.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It has been fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a specialized focus on improving the model's ability to handle complex reasoning tasks, particularly in mathematical domains.

Capabilities

Enhanced Reasoning: The application of GRPO suggests improved performance in tasks requiring logical deduction and problem-solving.
Large Context Window: A 40960-token context length allows for processing and generating longer, more complex inputs and outputs.

Use Cases

This model is particularly well-suited for applications that demand:

Mathematical problem-solving.
Complex logical reasoning.
Processing extensive textual information where context retention is crucial.

Overview

Overview

Key Differentiator: GRPO Training

Capabilities

Use Cases

Full Model Card (README)