Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p8_1p0_grpo_42_rule
Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p8_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in large language models. It is particularly suited for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen3 architecture.
Loading preview...
Model Overview
This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p8_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a 32K context length. It was developed by Kazuki1450 and trained using the TRL framework.
Key Differentiator: GRPO Training
The primary distinction of this model lies in its training methodology. It leverages GRPO (Gradient Regularized Policy Optimization), a technique introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is specifically designed to enhance a model's capabilities in mathematical reasoning and complex problem-solving.
Capabilities
- Enhanced Mathematical Reasoning: Benefits from GRPO training, suggesting improved performance on tasks requiring logical and mathematical understanding.
- Base Model Strengths: Inherits the foundational capabilities of the Qwen3-1.7B-Base model.
- Text Generation: Capable of generating coherent and contextually relevant text, as demonstrated by the quick start example.
When to Use This Model
This model is particularly suitable for applications where:
- You need a compact model (2B parameters) with a focus on improved reasoning, especially in mathematical contexts.
- You are working with tasks that could benefit from the GRPO training approach for better logical consistency.
- You are looking for a fine-tuned Qwen3 variant with specific enhancements for problem-solving.