dgtege/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-eager_noisy_crab

Warm
Public
0.5B
BF16
32768
Jun 3, 2025
Hugging Face
Overview

Model Overview

This model, dgtege/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-eager_noisy_crab, is a fine-tuned variant of the Gensyn/Qwen2.5-0.5B-Instruct base model. It leverages the Qwen2.5 architecture and has been specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities

  • Enhanced Mathematical Reasoning: The primary differentiator of this model is its training with GRPO, a method introduced in the DeepSeekMath paper, which focuses on improving mathematical reasoning in language models.
  • Instruction-Following: As an instruction-tuned model, it is designed to follow user prompts and generate relevant responses.
  • TRL Framework: The fine-tuning process was conducted using the Hugging Face TRL (Transformer Reinforcement Learning) library, indicating a reinforcement learning approach to align the model with desired behaviors.

Training Details

The model's training procedure specifically incorporated GRPO, as detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an optimization for tasks that benefit from structured logical and mathematical processing. The training utilized TRL version 0.15.2, Transformers 4.48.2, Pytorch 2.5.1, Datasets 3.6.0, and Tokenizers 0.21.1.

Use Cases

This model is particularly suited for applications where a smaller language model with improved mathematical and logical reasoning capabilities is beneficial, especially when building upon the Qwen2.5-0.5B-Instruct foundation.