Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_10_1p0_0p0_1p0_grpo_2_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring advanced mathematical problem-solving and logical deduction, building upon the Qwen3 architecture with a 40960 token context length.
Loading preview...
Model Overview
This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_10_1p0_0p0_1p0_grpo_2_rule, is a specialized fine-tuned version of the Qwen3-1.7B-Base model. It incorporates a 2 billion parameter architecture and supports an extensive context length of 40960 tokens, making it suitable for processing longer inputs.
Key Differentiator: GRPO Training
The primary distinction of this model lies in its training methodology. It was fine-tuned using the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This technique is specifically designed to improve a model's ability in mathematical reasoning and complex problem-solving.
Training Details
- Base Model: Qwen/Qwen3-1.7B-Base
- Training Framework: TRL (Transformer Reinforcement Learning) version 0.23.0
- Core Method: GRPO, focused on enhancing mathematical reasoning.
Potential Use Cases
Given its GRPO-enhanced training, this model is particularly well-suited for applications requiring:
- Mathematical problem-solving
- Logical reasoning tasks
- Scientific computing assistance
- Educational tools for math and logic
Developers can quickly integrate this model using the provided transformers pipeline for text generation tasks, especially those benefiting from improved reasoning capabilities.