eiknarf/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-rapid_stocky_stork
eiknarf/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-rapid_stocky_stork is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned from Gensyn/Qwen2.5-0.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is primarily suited for tasks requiring improved logical and mathematical problem-solving, building upon its base Qwen2.5 architecture.
Loading preview...
Overview
This model, eiknarf/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-rapid_stocky_stork, is a 0.5 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Gensyn/Qwen2.5-0.5B-Instruct base model, developed by Gensyn. The fine-tuning process utilized the TRL (Transformer Reinforcement Learning) framework.
Key Capabilities
- Enhanced Mathematical Reasoning: This model was specifically trained using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method. GRPO is known for pushing the limits of mathematical reasoning in language models, as introduced in the DeepSeekMath paper.
- Instruction Following: As an instruction-tuned model, it is designed to follow user prompts and generate relevant responses.
Good For
- Applications requiring improved logical and mathematical problem-solving capabilities.
- Tasks where a smaller, efficient model with specialized reasoning enhancements is beneficial.
- Experimentation with models fine-tuned using advanced reinforcement learning techniques like GRPO.
Training Details
The model's training procedure leveraged the GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using TRL version 0.15.2, with Transformers 4.51.3 and Pytorch 2.6.0.