dgtege/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-eager_noisy_crab

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Jun 3, 2025Architecture:Transformer Warm

dgtege/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-eager_noisy_crab is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned from Gensyn/Qwen2.5-0.5B-Instruct. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is suitable for tasks requiring improved logical and mathematical problem-solving, building upon the Qwen2.5 architecture.

Loading preview...

Model Overview

This model, dgtege/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-eager_noisy_crab, is a fine-tuned variant of the Gensyn/Qwen2.5-0.5B-Instruct base model. It leverages the Qwen2.5 architecture and has been specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities

  • Enhanced Mathematical Reasoning: The primary differentiator of this model is its training with GRPO, a method introduced in the DeepSeekMath paper, which focuses on improving mathematical reasoning in language models.
  • Instruction-Following: As an instruction-tuned model, it is designed to follow user prompts and generate relevant responses.
  • TRL Framework: The fine-tuning process was conducted using the Hugging Face TRL (Transformer Reinforcement Learning) library, indicating a reinforcement learning approach to align the model with desired behaviors.

Training Details

The model's training procedure specifically incorporated GRPO, as detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that benefit from structured logical and mathematical processing. The training utilized TRL version 0.15.2, Transformers 4.48.2, Pytorch 2.5.1, Datasets 3.6.0, and Tokenizers 0.21.1.

Use Cases

This model is particularly suited for applications where a smaller language model with improved mathematical and logical reasoning capabilities is beneficial, especially when building upon the Qwen2.5-0.5B-Instruct foundation.