Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_State_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 12, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_State_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base with a 40960 token context length. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is suitable for tasks requiring advanced logical and mathematical problem-solving, building upon the base Qwen3 architecture.

Loading preview...

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_State_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned version of the 2 billion parameter Qwen/Qwen3-1.7B-Base model. It leverages a substantial context length of 40960 tokens, making it capable of processing extensive inputs.

Key Training Details

The model was trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach suggests an optimization for tasks that benefit from enhanced mathematical and logical reasoning.

Frameworks Used

Training was conducted using the TRL library, with specific versions including:

  • TRL: 0.23.0
  • Transformers: 4.57.1
  • Pytorch: 2.7.1+cu128
  • Datasets: 4.4.1
  • Tokenizers: 0.22.1

Potential Use Cases

Given its fine-tuning with the GRPO method, this model is likely well-suited for applications requiring:

  • Mathematical problem-solving
  • Logical reasoning tasks
  • Complex question answering where numerical or logical deduction is critical

Developers can quickly get started with text generation using the provided transformers pipeline example.