Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_boxed_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_boxed_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is suitable for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen3 architecture.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_boxed_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen/Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a 32768 token context length. It has been developed by Kazuki1450 using the TRL (Transformers Reinforcement Learning) framework.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology, which incorporates GRPO (Gradient-based Reward Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300), suggests an optimization for tasks requiring robust mathematical and logical reasoning.

Potential Use Cases

  • Mathematical Problem Solving: Due to its GRPO-enhanced training, this model may perform well in tasks involving mathematical reasoning, calculations, and logical deductions.
  • General Text Generation: As a fine-tuned Qwen3-1.7B-Base model, it retains general language understanding and generation capabilities.

Training Details

The model was trained using TRL version 0.29.0, with Transformers 4.57.3, Pytorch 2.9.0, Datasets 4.0.0, and Tokenizers 0.22.1. Further details on the training run can be visualized via Weights & Biases.