Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p5_0p0_1p0_grpo_42_rule

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 26, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p5_0p0_1p0_grpo_42_rule is a 2 billion parameter causal language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring improved logical and mathematical problem-solving, leveraging its 32768 token context length.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p5_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, developed by Kazuki1450. It features approximately 2 billion parameters and supports a substantial context length of 32768 tokens.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model was specifically trained using the GRPO (Gradient-based Regularization for Policy Optimization) method, as introduced in the DeepSeekMath research paper. This training approach aims to significantly improve its performance on mathematical reasoning tasks.
  • Fine-tuned with TRL: The fine-tuning process utilized the TRL (Transformers Reinforcement Learning) framework, suggesting a focus on optimizing model behavior through reinforcement learning techniques.
  • Base Model: Built upon the robust Qwen3-1.7B-Base, providing a strong foundation for general language understanding and generation.

Good For

  • Mathematical Problem Solving: Due to its GRPO-based training, this model is particularly well-suited for applications requiring accurate mathematical reasoning and problem-solving.
  • Research and Development: Developers and researchers interested in exploring the impact of GRPO on smaller language models for specific tasks, especially in the domain of mathematics.
  • Applications requiring a balance of size and capability: Its 2 billion parameter count makes it more efficient than larger models while still offering specialized capabilities in mathematical reasoning.