Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p2_1p0_grpo_dr_grpo_42_rule

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 23, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p2_1p0_grpo_dr_grpo_42_rule is a 1.7 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in large language models. It is optimized for tasks requiring robust mathematical and logical processing, making it suitable for applications where precise reasoning is critical. The model supports a context length of 32768 tokens.

Loading preview...

Model Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p2_1p0_grpo_dr_grpo_42_rule, is a specialized fine-tuned version of the Qwen/Qwen3-1.7B-Base architecture, featuring approximately 1.7 billion parameters and a 32K token context window. It has been developed by Kazuki1450, building upon the foundational Qwen3-1.7B-Base model.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is specifically designed to improve a model's proficiency in mathematical reasoning and problem-solving.

Capabilities

  • Enhanced Mathematical Reasoning: Optimized for tasks that require logical deduction and mathematical understanding due to its GRPO training.
  • Base Model Performance: Inherits the general language understanding and generation capabilities of the Qwen3-1.7B-Base model.
  • Extended Context: Supports a substantial context length of 32,768 tokens, allowing for processing longer inputs and maintaining coherence over extended conversations or documents.

Recommended Use Cases

  • Mathematical Problem Solving: Ideal for applications involving arithmetic, algebra, geometry, or other mathematical challenges.
  • Logical Reasoning Tasks: Suitable for scenarios requiring structured thought and logical inference.
  • Educational Tools: Can be integrated into systems designed to assist with learning or tutoring in STEM fields.
  • Research and Development: A strong candidate for further experimentation and fine-tuning on domain-specific mathematical datasets.