Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p8_0p0_1p0_grpo_42_rule

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 27, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p8_0p0_1p0_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is particularly suited for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen3 architecture. The model leverages a 32768 token context length for processing extensive inputs.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p8_0p0_1p0_grpo_42_rule, is a specialized fine-tuned version of the Qwen3-1.7B-Base architecture, developed by Kazuki1450. It features approximately 2 billion parameters and supports a substantial context length of 32768 tokens.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is specifically designed to enhance a model's capabilities in mathematical reasoning and complex problem-solving.

Training Details

  • Base Model: Qwen/Qwen3-1.7B-Base
  • Fine-tuning Framework: Hugging Face's TRL (Transformers Reinforcement Learning)
  • Methodology: GRPO, focused on improving mathematical reasoning.

Use Cases

This model is particularly well-suited for applications that require:

  • Mathematical problem-solving: Leveraging its GRPO training for improved accuracy.
  • Logical reasoning tasks: Benefiting from the enhanced reasoning capabilities.
  • General text generation: Building upon the robust foundation of the Qwen3-1.7B-Base model.

Developers can quickly integrate this model using the transformers library for text generation tasks, as demonstrated in the quick start example.