Kazuki1450/Qwen3-1.7B-Base_csum_6_10_clean_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_clean_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is particularly suited for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen3 architecture.

Loading preview...

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_clean_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters. It was developed by Kazuki1450 and trained using the TRL framework.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model incorporates the GRPO (Gradient-based Reward Policy Optimization) training method, as introduced in the DeepSeekMath paper, to improve its mathematical reasoning abilities.
  • Base Model Foundation: Built upon the robust Qwen3-1.7B-Base, it inherits the general language understanding and generation capabilities of its parent model.

Training Details

The model's training procedure leveraged the TRL library (version 0.29.0) and the GRPO method. GRPO is specifically designed to push the limits of mathematical reasoning in open language models, suggesting this model is optimized for tasks requiring precise logical and numerical processing.

Use Cases

This model is particularly well-suited for applications where improved mathematical and logical reasoning is critical. Developers can utilize it for tasks that benefit from a model with a stronger foundation in problem-solving, potentially including scientific text analysis, data interpretation, or educational tools requiring accurate calculations and logical deductions.