Kazuki1450/Qwen3-1.7B-Base_csum_6_10_assistant_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 11, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_assistant_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for assistant-like interactions, particularly in scenarios requiring structured responses or problem-solving.

Loading preview...

Overview

This model, developed by Kazuki1450, is a fine-tuned version of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters. It leverages the Transformer Reinforcement Learning (TRL) framework for its training process.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the "DeepSeekMath" paper, indicating a focus on improving mathematical problem-solving and reasoning.
  • Assistant-like Interactions: The fine-tuning suggests an optimization for conversational or assistant-style applications, capable of generating coherent and relevant responses to user queries.
  • Base Model Adaptability: Built upon Qwen3-1.7B-Base, it inherits the foundational language understanding and generation capabilities of the Qwen family.

Training Details

The training procedure utilized TRL version 0.23.0, with Transformers 4.57.1 and PyTorch 2.7.1+cu128. The application of the GRPO method is a key differentiator, aiming to push the model's performance in complex reasoning tasks, particularly those involving mathematical concepts.

Good for

  • Applications requiring a compact language model with improved mathematical reasoning.
  • Developing AI assistants that need to handle structured queries or problem-solving scenarios.
  • Research into the effectiveness of GRPO for fine-tuning base models on specific cognitive tasks.