Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_10_1p0_0p0_1p0_grpo_2_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 22, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_10_1p0_0p0_1p0_grpo_2_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced mathematical problem-solving and logical deduction, building upon the Qwen3 architecture with a 40960 token context length.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_10_1p0_0p0_1p0_grpo_2_rule, is a specialized fine-tuned version of the Qwen3-1.7B-Base model. It incorporates a 2 billion parameter architecture and supports an extensive context length of 40960 tokens, making it suitable for processing longer inputs.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It was fine-tuned using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This technique is specifically designed to improve a model's ability in mathematical reasoning and complex problem-solving.

Training Details

  • Base Model: Qwen/Qwen3-1.7B-Base
  • Training Framework: TRL (Transformer Reinforcement Learning) version 0.23.0
  • Core Method: GRPO, focused on enhancing mathematical reasoning.

Potential Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for applications requiring:

  • Mathematical problem-solving
  • Logical reasoning tasks
  • Scientific computing assistance
  • Educational tools for math and logic

Developers can quickly integrate this model using the provided transformers pipeline for text generation tasks, especially those benefiting from improved reasoning capabilities.