Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_assistant_1p0_0p0_1p0_grpo_1_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 14, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_assistant_1p0_0p0_1p0_grpo_1_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model leverages the GRPO method, known for enhancing mathematical reasoning in large language models. It is specifically optimized for assistant-like conversational tasks, building upon its base model's capabilities.

Loading preview...

Model Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters. It has been specifically trained using the TRL (Transformer Reinforcement Learning) framework.

Key Training Methodology

A significant aspect of this model's development is the application of GRPO (Gradient-based Reward Policy Optimization). This method, introduced in the "DeepSeekMath" paper, is designed to improve mathematical reasoning capabilities in large language models. The integration of GRPO suggests an emphasis on robust and logical response generation, particularly in structured or reasoning-intensive dialogues.

Intended Use

This model is suitable for deployment in assistant-like applications where conversational interaction and potentially reasoning-based responses are required. Its fine-tuning process aims to enhance its ability to act as a helpful and coherent assistant, building on the foundational strengths of the Qwen3-1.7B-Base model.