Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_English_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_English_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on enhancing mathematical reasoning. It is specifically optimized for tasks requiring improved reasoning capabilities, leveraging its 32768 token context length.

Loading preview...

Model Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base model, featuring 2 billion parameters and a substantial 32768 token context length. It was trained using the TRL framework.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It utilizes GRPO (Gradient-based Reasoning Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that benefit from enhanced reasoning, particularly in mathematical contexts.

Technical Details

  • Base Model: Qwen/Qwen3-1.7B-Base
  • Training Framework: TRL (Transformers Reinforcement Learning)
  • Parameter Count: Approximately 2 billion
  • Context Length: 32768 tokens

Use Cases

Given its GRPO training, this model is likely suitable for applications requiring:

  • Improved reasoning capabilities, especially in structured or logical problem-solving.
  • Tasks that benefit from the principles outlined in the DeepSeekMath paper, potentially including mathematical reasoning or complex logical deductions.

Quick Start Example

Users can quickly integrate the model using the transformers pipeline for text generation, as demonstrated in the provided example, to explore its reasoning capabilities.