Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p0_1p0_grpo_dr_grpo_42_rule

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 23, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p0_1p0_grpo_dr_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on enhancing mathematical reasoning capabilities. It is optimized for tasks requiring robust mathematical and logical processing, making it suitable for specialized applications in this domain. The model has a context length of 32768 tokens.

Loading preview...

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p0_1p0_grpo_dr_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a 32768-token context length. It was developed by Kazuki1450 and trained using the TRL (Transformers Reinforcement Learning) framework.

Key Differentiator: GRPO Training

A core aspect of this model is its training methodology, which incorporates GRPO (Gradient-based Reward Policy Optimization). This technique, originally introduced in the DeepSeekMath paper, is designed to significantly improve a model's mathematical reasoning abilities. This makes the model particularly adept at handling complex numerical and logical problems.

Use Cases

Given its specialized training with GRPO, this model is particularly well-suited for:

  • Mathematical problem-solving: Tasks requiring precise calculations and logical deduction.
  • Scientific computing: Applications involving quantitative analysis and data interpretation.
  • Reasoning-intensive tasks: Scenarios where robust logical inference is critical.

Training Details

The model was fine-tuned using TRL, with specific framework versions including TRL 0.29.0, Transformers 4.57.6, and Pytorch 2.9.0. The training process is publicly visualized via Weights & Biases, indicating a structured and monitored development approach.