Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_2_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities, as introduced in the DeepSeekMath paper. It is optimized for tasks requiring robust mathematical and logical processing, making it suitable for specialized reasoning applications. The model supports a context length of 40960 tokens.
Loading preview...
Overview
This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-1_1p0_0p0_1p0_grpo_2_rule, is a fine-tuned variant of the Qwen/Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and a substantial context length of 40960 tokens. It was developed by Kazuki1450 and trained using the TRL framework.
Key Differentiator: GRPO Training
The primary distinction of this model lies in its training methodology. It leverages GRPO (Gradient-based Reasoning Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models." This specialized training aims to significantly improve the model's proficiency in mathematical reasoning tasks.
Training Details
- Base Model: Qwen/Qwen3-1.7B-Base
- Training Framework: TRL (Transformer Reinforcement Learning)
- Methodology: GRPO, focused on enhancing mathematical reasoning.
Use Cases
Given its GRPO-based training, this model is particularly well-suited for applications that demand:
- Mathematical problem-solving
- Logical reasoning tasks
- Scientific computing assistance
Developers can integrate this model using the transformers library, as demonstrated in the quick start guide, to generate responses for complex questions, especially those with a mathematical or logical underpinning.