Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_State_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base with a 40960 token context length. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is suitable for tasks requiring advanced logical and mathematical problem-solving, building upon the base Qwen3 architecture.
Loading preview...
Overview
This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_State_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned version of the 2 billion parameter Qwen/Qwen3-1.7B-Base model. It leverages a substantial context length of 40960 tokens, making it capable of processing extensive inputs.
Key Training Details
The model was trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This training approach suggests an optimization for tasks that benefit from enhanced mathematical and logical reasoning.
Frameworks Used
Training was conducted using the TRL library, with specific versions including:
- TRL: 0.23.0
- Transformers: 4.57.1
- Pytorch: 2.7.1+cu128
- Datasets: 4.4.1
- Tokenizers: 0.22.1
Potential Use Cases
Given its fine-tuning with the GRPO method, this model is likely well-suited for applications requiring:
- Mathematical problem-solving
- Logical reasoning tasks
- Complex question answering where numerical or logical deduction is critical
Developers can quickly get started with text generation using the provided transformers pipeline example.