Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-5_1p0_0p0_1p0_grpo_1_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 22, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-5_1p0_0p0_1p0_grpo_1_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base with a 40960 token context length. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in large language models. It is specifically optimized for tasks requiring robust mathematical problem-solving and logical deduction.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-5_1p0_0p0_1p0_grpo_1_rule, is a fine-tuned variant of the Qwen/Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial 40960 token context window. Its development utilized the TRL framework for training.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This specialized training aims to significantly improve the model's proficiency in mathematical reasoning tasks.

Potential Use Cases

  • Mathematical Problem Solving: Ideal for applications requiring the model to understand and solve complex mathematical problems.
  • Logical Reasoning: Suitable for tasks that benefit from enhanced logical deduction capabilities.
  • Research and Development: Can serve as a base for further experimentation in improving mathematical and reasoning performance in smaller language models.