Name: Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_Fourth_1p0_0p0_1p0_grpo_1_rule API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Kazuki1450

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_Fourth_1p0_0p0_1p0_grpo_1_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial 40960 token context window. It was developed by Kazuki1450 and fine-tuned using the TRL library.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It leverages the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (Shao et al., 2024). This technique is specifically designed to improve a model's mathematical reasoning abilities.

Intended Use Cases

Given its GRPO-based training, this model is particularly well-suited for applications that demand enhanced logical and mathematical problem-solving. Developers can consider this model for tasks where the base Qwen3-1.7B-Base might fall short in complex reasoning or numerical accuracy. It aims to provide a more robust foundation for mathematical and logical inference compared to its base model counterpart.

Quick Start

For immediate use, the model can be loaded via the transformers library's pipeline function for text generation tasks, as demonstrated in the model card.

Overview

Model Overview

Key Differentiator: GRPO Training

Intended Use Cases

Quick Start

Full Model Card (README)