Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_English_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on enhancing mathematical reasoning. It is specifically optimized for tasks requiring improved reasoning capabilities, leveraging its 32768 token context length.
Loading preview...
Model Overview
This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base model, featuring 2 billion parameters and a substantial 32768 token context length. It was trained using the TRL framework.
Key Differentiator: GRPO Training
The primary distinction of this model lies in its training methodology. It utilizes GRPO (Gradient-based Reasoning Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that benefit from enhanced reasoning, particularly in mathematical contexts.
Technical Details
- Base Model: Qwen/Qwen3-1.7B-Base
- Training Framework: TRL (Transformers Reinforcement Learning)
- Parameter Count: Approximately 2 billion
- Context Length: 32768 tokens
Use Cases
Given its GRPO training, this model is likely suitable for applications requiring:
- Improved reasoning capabilities, especially in structured or logical problem-solving.
- Tasks that benefit from the principles outlined in the DeepSeekMath paper, potentially including mathematical reasoning or complex logical deductions.
Quick Start Example
Users can quickly integrate the model using the transformers pipeline for text generation, as demonstrated in the provided example, to explore its reasoning capabilities.