Kazuki1450/Qwen3-1.7B-Base_csum_6_10_final_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 11, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_final_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. With a context length of 40960 tokens, it is optimized for tasks requiring advanced mathematical problem-solving and logical deduction.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_final_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial context length of 40960 tokens. It was developed by Kazuki1450 and built upon the original Qwen model.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It has been fine-tuned using GRPO (Gradient Regularized Policy Optimization), a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This specialized training aims to significantly enhance the model's capabilities in mathematical reasoning and complex problem-solving.

Technical Details

  • Base Model: Qwen/Qwen3-1.7B-Base
  • Training Framework: TRL (Transformer Reinforcement Learning) version 0.23.0
  • Parameter Count: Approximately 2 billion
  • Context Length: 40960 tokens

Potential Use Cases

Given its GRPO-enhanced training, this model is particularly well-suited for applications requiring:

  • Mathematical problem-solving: From basic arithmetic to more complex algebraic or calculus-based queries.
  • Logical reasoning tasks: Where structured thought processes and deduction are critical.
  • Scientific computing assistance: Generating or interpreting mathematical expressions and concepts.

Developers can quickly integrate and experiment with this model using the provided transformers pipeline example for text generation tasks.