Axelerate/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-flexible_bold_butterfly

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 2, 2025Architecture:Transformer Warm

Axelerate/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-flexible_bold_butterfly is a fine-tuned instruction-following language model based on Gensyn's Qwen2.5-0.5B-Instruct. This model has been specifically trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is suitable for tasks requiring improved logical and mathematical problem-solving, building upon the base Qwen2.5 architecture.

Loading preview...

Overview

This model, Axelerate/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-flexible_bold_butterfly, is an instruction-tuned variant of the Gensyn/Qwen2.5-0.5B-Instruct base model. It has been fine-tuned using the TRL library and incorporates the GRPO (Gradient-based Reward Policy Optimization) training method. The GRPO method, detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," suggests an optimization for tasks requiring robust mathematical and logical reasoning.

Key Capabilities

  • Instruction Following: Inherits and refines the instruction-following abilities of the Qwen2.5-Instruct series.
  • Enhanced Reasoning: Benefits from GRPO training, which is associated with improved mathematical reasoning in language models.

Training Details

The model was trained with specific versions of key frameworks:

  • TRL: 0.15.2
  • Transformers: 4.48.2
  • Pytorch: 2.5.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Good For

  • Applications requiring a compact instruction-tuned model with a focus on logical or mathematical problem-solving.
  • Scenarios where the base Qwen2.5-0.5B-Instruct model's reasoning capabilities need a boost through specialized fine-tuning.