Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_2_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 22, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_2_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities, as introduced in the DeepSeekMath paper. It is optimized for tasks requiring robust mathematical and logical processing, making it suitable for specialized reasoning applications. The model supports a context length of 40960 tokens.

Loading preview...

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_2_rule, is a fine-tuned variant of the Qwen/Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial context length of 40960 tokens. It was developed by Kazuki1450 and trained using the TRL framework.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It leverages GRPO (Gradient-based Reasoning Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This specialized training aims to significantly improve the model's proficiency in mathematical reasoning tasks.

Training Details

  • Base Model: Qwen/Qwen3-1.7B-Base
  • Training Framework: TRL (Transformer Reinforcement Learning)
  • Methodology: GRPO, focused on enhancing mathematical reasoning.

Use Cases

Given its GRPO-based training, this model is particularly well-suited for applications that demand:

  • Mathematical problem-solving
  • Logical reasoning tasks
  • Scientific computing assistance

Developers can integrate this model using the transformers library, as demonstrated in the quick start guide, to generate responses for complex questions, especially those with a mathematical or logical underpinning.