Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_array_1p0_0p0_1p0_grpo_42_rule

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_array_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in language models. It is optimized for tasks requiring robust mathematical problem-solving and logical deduction, leveraging its 32768 token context length for complex inputs. This fine-tuned variant is suitable for applications demanding improved accuracy in quantitative and reasoning-based queries.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_array_1p0_0p0_1p0_grpo_42_rule, is a specialized fine-tuned version of the Qwen/Qwen3-1.7B-Base model, developed by Kazuki1450. It incorporates the GRPO (Gradient-based Reasoning Policy Optimization) training method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Key Capabilities

  • Enhanced Mathematical Reasoning: The primary differentiator of this model is its training with GRPO, specifically aimed at improving performance on mathematical and logical reasoning tasks.
  • Base Model: Built upon the Qwen3-1.7B-Base architecture, providing a solid foundation for language understanding and generation.
  • TRL Framework: Training was conducted using the TRL (Transformers Reinforcement Learning) library, indicating a reinforcement learning approach to fine-tuning.

Good For

  • Mathematical Problem Solving: Ideal for applications requiring the model to understand and solve complex mathematical problems.
  • Logical Deduction: Suitable for tasks that benefit from improved logical reasoning abilities.
  • Research and Experimentation: Developers interested in exploring the impact of GRPO on smaller language models for specific reasoning tasks.

This model offers a focused approach to improving reasoning capabilities within a 2 billion parameter footprint, making it a candidate for scenarios where mathematical accuracy is paramount.