Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_1_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 22, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_1_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model leverages the GRPO method, as introduced in the DeepSeekMath paper, to enhance its mathematical reasoning capabilities. It is specifically optimized for tasks requiring robust mathematical problem-solving and logical deduction. The model is suitable for applications where accurate numerical and logical processing is critical.

Loading preview...

Model Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a 40960-token context length. It has been specifically trained using the Transformer Reinforcement Learning (TRL) framework.

Key Differentiator: GRPO Method

A core aspect of this model's training is the application of GRPO (Gradient-based Reward Policy Optimization). This method, detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), is designed to significantly improve a model's mathematical reasoning abilities. By incorporating GRPO, this model aims to excel in complex numerical and logical tasks.

Training Details

The model's fine-tuning process utilized TRL, a library for training language models with reinforcement learning. The specific framework versions used include TRL 0.23.0, Transformers 4.57.1, Pytorch 2.7.1+cu128, Datasets 4.4.1, and Tokenizers 0.22.1.

Recommended Use Cases

  • Mathematical Reasoning: Ideal for problems requiring precise calculations, logical deduction, and understanding of mathematical concepts.
  • Scientific Computing: Can be applied to tasks involving data analysis, formula interpretation, and problem-solving in scientific domains.
  • Educational Tools: Potentially useful for generating explanations or solutions for mathematical problems.