eiknarf/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-rapid_stocky_stork

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:May 3, 2025Architecture:Transformer Cold

eiknarf/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-rapid_stocky_stork is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned from Gensyn/Qwen2.5-0.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model is primarily suited for tasks requiring improved logical and mathematical problem-solving, building upon its base Qwen2.5 architecture.

Loading preview...

Overview

This model, eiknarf/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-rapid_stocky_stork, is a 0.5 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Gensyn/Qwen2.5-0.5B-Instruct base model, developed by Gensyn. The fine-tuning process utilized the TRL (Transformer Reinforcement Learning) framework.

Key Capabilities

  • Enhanced Mathematical Reasoning: This model was specifically trained using the GRPO (Gradient-based Reinforcement Learning with Policy Optimization) method. GRPO is known for pushing the limits of mathematical reasoning in language models, as introduced in the DeepSeekMath paper.
  • Instruction Following: As an instruction-tuned model, it is designed to follow user prompts and generate relevant responses.

Good For

  • Applications requiring improved logical and mathematical problem-solving capabilities.
  • Tasks where a smaller, efficient model with specialized reasoning enhancements is beneficial.
  • Experimentation with models fine-tuned using advanced reinforcement learning techniques like GRPO.

Training Details

The model's training procedure leveraged the GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training was conducted using TRL version 0.15.2, with Transformers 4.51.3 and Pytorch 2.6.0.