Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p8_0p0_1p0_grpo_dr_grpo_42_rule

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 27, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p8_0p0_1p0_grpo_dr_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is particularly suited for tasks requiring robust logical and mathematical problem-solving, building upon the foundational Qwen3-1.7B architecture.

Loading preview...

Overview

This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p8_0p0_1p0_grpo_dr_grpo_42_rule, is a specialized fine-tuned version of the Qwen/Qwen3-1.7B-Base model. It leverages the TRL (Transformers Reinforcement Learning) framework for its training process.

Key Capabilities & Training

The primary differentiator of this model is its training methodology: it incorporates GRPO (Gradient-based Reward Policy Optimization). GRPO is a method introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a focus on improving the model's ability to handle complex mathematical reasoning tasks.

Technical Details

  • Base Model: Qwen3-1.7B-Base
  • Training Framework: TRL (version 0.29.0)
  • Core Training Method: GRPO, aimed at enhancing mathematical reasoning.

Use Cases

Given its fine-tuning with GRPO, this model is particularly well-suited for applications that require:

  • Mathematical problem-solving
  • Logical reasoning tasks
  • Scientific or engineering computations where robust numerical understanding is critical.

Developers can quickly integrate and test the model using the provided transformers pipeline example for text generation.