Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_array_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 12, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_array_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It leverages a 40960 token context length, making it suitable for tasks requiring extensive contextual understanding. The fine-tuning process aims to improve its performance in complex reasoning tasks, particularly those involving mathematical problem-solving.

Loading preview...

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_array_1p0_0p0_1p0_grpo_42_rule, is a 2 billion parameter language model built upon the Qwen3-1.7B-Base architecture. It has been specifically fine-tuned using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach aims to significantly improve the model's proficiency in mathematical reasoning tasks.

Key Capabilities

  • Enhanced Mathematical Reasoning: The primary differentiator of this model is its fine-tuning with GRPO, which is designed to boost performance in mathematical problem-solving and logical deduction.
  • Large Context Window: With a context length of 40960 tokens, it can process and generate responses based on extensive input, beneficial for complex queries.
  • Base Model Foundation: Inherits the robust capabilities of the Qwen3-1.7B-Base model, providing a strong general language understanding foundation.

Training Details

The model was trained using the TRL (Transformer Reinforcement Learning) framework, specifically version 0.23.0, with Transformers 4.57.1 and Pytorch 2.7.1+cu128. The GRPO method focuses on optimizing reasoning policies, making it particularly effective for tasks requiring structured thought processes.

Good For

  • Applications requiring strong mathematical reasoning.
  • Tasks involving complex logical problem-solving.
  • Scenarios where a large context window is crucial for understanding detailed prompts.