Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_rel_1e-2_1p0_0p0_1p0_grpo_42_rule

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 24, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_rel_1e-2_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base using the TRL framework. This model incorporates the GRPO (Gradient Regularized Policy Optimization) method, which is specifically designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring robust logical and mathematical problem-solving, making it suitable for applications in scientific computing and data analysis.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_rel_1e-2_1p0_0p0_1p0_grpo_42_rule, is a specialized fine-tuned version of the Qwen/Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a context length of 32768 tokens. It was developed by Kazuki1450 and trained using the Hugging Face TRL (Transformers Reinforcement Learning) framework.

Key Differentiator: GRPO Training

A core aspect of this model is its training methodology, which utilizes GRPO (Gradient Regularized Policy Optimization). This technique, introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is specifically designed to enhance a model's capabilities in complex mathematical reasoning tasks. This makes the model particularly adept at handling problems that require logical deduction and numerical precision.

Training Details

The model's training process leveraged specific versions of popular machine learning frameworks:

  • TRL: 0.29.0
  • Transformers: 4.57.6
  • PyTorch: 2.9.0
  • Datasets: 4.8.2
  • Tokenizers: 0.22.2

Recommended Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for:

  • Mathematical problem-solving: Excelling in tasks that demand strong mathematical reasoning.
  • Scientific computing: Assisting with calculations, formula derivation, and data interpretation.
  • Logical deduction: Applications requiring precise and structured reasoning.

Developers can quickly get started using the provided transformers pipeline example for text generation.