Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_assistant_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is specifically optimized for assistant-like interactions, leveraging its base architecture and specialized training for improved performance in conversational and reasoning tasks. The model has a context length of 40960 tokens, making it suitable for processing longer inputs.
Loading preview...
Model Overview
This model, Kazuki1450/Qwen3-1.7B-Base_csum_6_10_tok_assistant_1p0_0p0_1p0_grpo_42_rule, is a fine-tuned variant of the Qwen3-1.7B-Base architecture, featuring 2 billion parameters and a substantial context length of 40960 tokens. It was developed by Kazuki1450 and trained using the TRL framework.
Key Capabilities
- Enhanced Reasoning: The model incorporates the GRPO (Gradient-based Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, to improve its mathematical and general reasoning abilities.
- Assistant-like Interactions: Fine-tuned for conversational and assistant roles, making it suitable for generating helpful and coherent responses to user queries.
- Long Context Handling: With a 40960-token context window, it can process and understand extensive inputs, maintaining coherence over longer dialogues or documents.
Training Details
The model's training procedure utilized the TRL (Transformer Reinforcement Learning) framework. The application of the GRPO method, detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," is a core aspect of its fine-tuning, aiming to bolster its logical and mathematical problem-solving skills.