Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_first_1p0_0p0_1p0_grpo_42_rule
Kazuki1450/Qwen2.5-1.5B-Instruct_csum_6_10_tok_first_1p0_0p0_1p0_grpo_42_rule is a 1.5 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-1.5B-Instruct. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. With a context length of 131072 tokens, it is optimized for tasks requiring advanced logical and mathematical problem-solving.
Loading preview...
Overview
This model, developed by Kazuki1450, is a fine-tuned variant of the Qwen2.5-1.5B-Instruct base model, featuring 1.5 billion parameters and a substantial context length of 131072 tokens. It has been specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This training approach suggests a focus on improving the model's ability to handle complex reasoning tasks, particularly in mathematical domains.
Key Capabilities
- Enhanced Reasoning: The application of the GRPO method implies a specialization in improving logical and mathematical reasoning abilities.
- Instruction Following: As an instruction-tuned model, it is designed to accurately follow user prompts and generate relevant responses.
- Large Context Window: With a 131072-token context length, it can process and generate text based on extensive input, beneficial for complex problem-solving or long-form content generation.
Training Details
The model was fine-tuned using the TRL library (Transformer Reinforcement Learning) and incorporates the GRPO method. This method is detailed in the DeepSeekMath paper, which focuses on advancing mathematical reasoning in large language models.
When to Use This Model
This model is particularly suitable for applications requiring strong reasoning capabilities, especially those involving mathematical or logical problem-solving. Its instruction-following nature makes it versatile for various NLP tasks where precise and context-aware responses are needed, leveraging its large context window for handling detailed queries.