Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_1_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 22, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_1_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model utilizes the GRPO training method, as introduced in the DeepSeekMath paper, to enhance its capabilities. With a substantial context length of 40960 tokens, it is optimized for tasks requiring advanced mathematical reasoning and complex problem-solving. Its primary strength lies in processing and generating responses for intricate logical and mathematical queries.

Loading preview...

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-9_1p0_0p0_1p0_grpo_1_rule, is a fine-tuned version of the Qwen/Qwen3-1.7B-Base architecture, featuring 2 billion parameters and a 40960-token context length. It was developed by Kazuki1450 using the TRL framework.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It has been fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a specialized focus on improving the model's ability to handle complex reasoning tasks, particularly in mathematical domains.

Capabilities

  • Enhanced Reasoning: The application of GRPO suggests improved performance in tasks requiring logical deduction and problem-solving.
  • Large Context Window: A 40960-token context length allows for processing and generating longer, more complex inputs and outputs.

Use Cases

This model is particularly well-suited for applications that demand:

  • Mathematical problem-solving.
  • Complex logical reasoning.
  • Processing extensive textual information where context retention is crucial.