Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-7_1p0_0p0_1p0_grpo_2_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 22, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-7_1p0_0p0_1p0_grpo_2_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model utilizes the GRPO method, as introduced in the DeepSeekMath paper, to enhance its capabilities. It is designed for general text generation tasks, leveraging its base architecture and specialized training for improved performance.

Loading preview...

Overview

This model, named Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-7_1p0_0p0_1p0_grpo_2_rule, is a 2 billion parameter language model derived from the Qwen/Qwen3-1.7B-Base architecture. It has been fine-tuned using the TRL framework.

Key Training Details

  • Base Model: Qwen/Qwen3-1.7B-Base
  • Fine-tuning Framework: TRL
  • Training Method: Incorporates GRPO (Grouped Recurrent Policy Optimization), a method detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for reasoning tasks, potentially mathematical.

Capabilities

  • Text Generation: Capable of generating human-like text based on given prompts.
  • Reasoning: The application of the GRPO method implies an enhanced focus on reasoning capabilities, particularly in areas where DeepSeekMath has shown strengths.

Usage

Developers can quickly integrate this model using the transformers library for text generation tasks, as demonstrated in the provided quick start example. The model is suitable for applications requiring a compact yet capable language model with improved reasoning characteristics due to its specialized training.