Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_result_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jan 12, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_result_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced mathematical problem-solving and logical deduction, building upon the Qwen3 architecture with a 40960 token context length.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_result_1p0_0p0_1p0_grpo_42_rule, is a 2 billion parameter language model fine-tuned from the base Qwen/Qwen3-1.7B-Base architecture. It leverages a substantial 40960 token context length, making it suitable for processing longer inputs.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model was specifically trained using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper. This training approach aims to significantly improve its performance on complex mathematical reasoning tasks.
  • Fine-tuned Performance: Built upon the robust Qwen3-1.7B-Base, this version benefits from targeted fine-tuning to specialize its capabilities.

Training Details

The model's training incorporated the GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The fine-tuning process utilized the TRL framework (version 0.23.0) alongside Transformers (4.57.1) and Pytorch (2.7.1+cu128).

Good For

  • Applications requiring strong mathematical problem-solving.
  • Tasks that benefit from advanced logical reasoning.
  • Developers looking for a Qwen3-based model with specialized mathematical capabilities.