Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-7_1p0_0p0_1p0_grpo_2_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model utilizes the GRPO method, as introduced in the DeepSeekMath paper, to enhance its capabilities. It is designed for general text generation tasks, leveraging its base architecture and specialized training for improved performance.
Loading preview...
Overview
This model, named Kazuki1450/Qwen3-1.7B-Base_csum_6_10_rel_1e-7_1p0_0p0_1p0_grpo_2_rule, is a 2 billion parameter language model derived from the Qwen/Qwen3-1.7B-Base architecture. It has been fine-tuned using the TRL framework.
Key Training Details
- Base Model: Qwen/Qwen3-1.7B-Base
- Fine-tuning Framework: TRL
- Training Method: Incorporates GRPO (Grouped Recurrent Policy Optimization), a method detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for reasoning tasks, potentially mathematical.
Capabilities
- Text Generation: Capable of generating human-like text based on given prompts.
- Reasoning: The application of the GRPO method implies an enhanced focus on reasoning capabilities, particularly in areas where DeepSeekMath has shown strengths.
Usage
Developers can quickly integrate this model using the transformers library for text generation tasks, as demonstrated in the provided quick start example. The model is suitable for applications requiring a compact yet capable language model with improved reasoning characteristics due to its specialized training.