Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_dr_grpo_42_rule
Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_dr_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, as detailed in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. It is optimized for tasks requiring robust logical and mathematical processing, making it suitable for specialized analytical applications. The model leverages a 32768 token context length for processing extensive inputs.
Loading preview...
Overview
This model, Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_dr_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, developed by Kazuki1450. It has been specifically trained using the GRPO (Gradient-based Reasoning Policy Optimization) method, a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).
Key Capabilities
- Enhanced Mathematical Reasoning: The primary differentiator of this model is its fine-tuning with GRPO, which aims to improve its ability to handle complex mathematical and logical reasoning tasks.
- Base Model Architecture: Built upon the Qwen3-1.7B-Base, it inherits the foundational language understanding and generation capabilities of the Qwen family.
- TRL Framework: Training was conducted using the TRL (Transformers Reinforcement Learning) library, indicating a focus on reinforcement learning from human feedback or similar optimization strategies.
Training Details
The model's training procedure involved the GRPO method, which is designed to push the boundaries of mathematical reasoning in language models. The training utilized specific versions of frameworks including TRL (0.29.0), Transformers (4.57.3), Pytorch (2.9.0), Datasets (4.0.0), and Tokenizers (0.22.1).
Good For
- Applications requiring improved mathematical problem-solving.
- Tasks that benefit from enhanced logical reasoning.
- Developers looking for a compact model (2B parameters) with specialized reasoning capabilities.