Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_Fourth_1p0_0p0_1p0_grpo_1_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 14, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_Fourth_1p0_0p0_1p0_grpo_1_rule is a 2 billion parameter language model, fine-tuned from Qwen3-1.7B-Base, with a 40960 token context length. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is suitable for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen3 architecture.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_Fourth_1p0_0p0_1p0_grpo_1_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial 40960 token context window. It was developed by Kazuki1450 and fine-tuned using the TRL library.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It leverages the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (Shao et al., 2024). This technique is specifically designed to improve a model's mathematical reasoning abilities.

Intended Use Cases

Given its GRPO-based training, this model is particularly well-suited for applications that demand enhanced logical and mathematical problem-solving. Developers can consider this model for tasks where the base Qwen3-1.7B-Base might fall short in complex reasoning or numerical accuracy. It aims to provide a more robust foundation for mathematical and logical inference compared to its base model counterpart.

Quick Start

For immediate use, the model can be loaded via the transformers library's pipeline function for text generation tasks, as demonstrated in the model card.