Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_division_1p0_0p0_1p0_grpo_42_rule
Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_division_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base, with a 32768 token context length. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is specifically optimized for tasks requiring robust mathematical problem-solving and logical deduction.
Loading preview...
Model Overview
This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring approximately 2 billion parameters and supporting a context length of 32768 tokens. Its training incorporated the GRPO (Gradient Regularized Policy Optimization) method, a technique introduced in the DeepSeekMath paper, which focuses on improving mathematical reasoning in language models.
Key Capabilities
- Enhanced Mathematical Reasoning: Leverages the GRPO training method to improve performance on mathematical and logical tasks.
- Base Model Foundation: Built upon the robust Qwen3-1.7B-Base, providing a strong general language understanding foundation.
- Extended Context Window: Supports a substantial 32768 token context length, allowing for processing longer inputs and maintaining coherence over extended dialogues or documents.
Training Details
The model was trained using the TRL library (version 0.29.0) and other standard frameworks including Transformers (4.57.3) and Pytorch (2.9.0). The GRPO method, central to its fine-tuning, is detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).
When to Use This Model
This model is particularly suitable for applications requiring strong mathematical problem-solving, logical deduction, or tasks where the GRPO method's benefits in reasoning are advantageous. Its large context window also makes it useful for processing and generating longer texts while maintaining contextual awareness.