Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p1_1p0_grpo_dr_grpo_42_rule

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 23, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p1_1p0_grpo_dr_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in large language models. With a context length of 32768 tokens, it is particularly suited for tasks requiring robust logical and mathematical processing. Its fine-tuning focuses on improving performance in complex reasoning scenarios.

Loading preview...

Overview

This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial context length of 32768 tokens. Its primary distinction lies in its training methodology, utilizing the GRPO (Gradient-based Reward Policy Optimization) technique. GRPO, introduced in the DeepSeekMath paper, is specifically designed to push the limits of mathematical reasoning in open language models.

Key Capabilities

  • Enhanced Mathematical Reasoning: The application of the GRPO training method suggests improved performance on tasks requiring logical and mathematical problem-solving.
  • Base Model from Qwen: Leverages the foundational capabilities of the Qwen3-1.7B-Base model, providing a strong general language understanding base.
  • Long Context Window: Supports a 32768-token context, enabling the processing of longer inputs and more complex problem descriptions.

Good For

  • Mathematical Tasks: Ideal for applications that involve mathematical reasoning, problem-solving, and logical deduction, benefiting from the GRPO fine-tuning.
  • Research and Development: Useful for researchers exploring the impact of advanced training techniques like GRPO on model performance.
  • Complex Query Handling: Its long context window makes it suitable for processing detailed instructions or multi-step problems.