Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_parentheses_1p0_0p0_1p0_grpo_42_rule

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_parentheses_1p0_0p0_1p0_grpo_42_rule is a 1.7 billion parameter language model, fine-tuned by Kazuki1450 from the Qwen3-1.7B-Base architecture. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring improved logical and mathematical processing, building upon the base Qwen3 model with a 32K context length.

Loading preview...

Model Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 1.7 billion parameters and a 32,768 token context length. It leverages the Qwen3 foundation, known for its strong general language understanding.

Key Differentiator: GRPO Training

The primary distinction of this model is its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This training approach specifically aims to enhance the model's capabilities in mathematical reasoning and problem-solving.

Training Details

  • Base Model: Qwen/Qwen3-1.7B-Base
  • Training Framework: TRL (Transformers Reinforcement Learning)
  • Methodology: GRPO, focused on improving mathematical reasoning.

Potential Use Cases

  • Applications requiring enhanced logical deduction.
  • Tasks involving mathematical problem-solving or reasoning.
  • Scenarios where a smaller, specialized model for numerical or logical tasks is beneficial.