chinna6/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-scaly_padded_macaw

TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 20, 2025Architecture:Transformer Cold

chinna6/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-scaly_padded_macaw is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned from Gensyn/Qwen2.5-0.5B-Instruct. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. With a context length of 32768 tokens, it is optimized for tasks requiring robust mathematical problem-solving.

Loading preview...

Overview

This model, chinna6/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-scaly_padded_macaw, is a 0.5 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Gensyn/Qwen2.5-0.5B-Instruct base model, developed by Gensyn. The fine-tuning process utilized the TRL library and specifically incorporated the GRPO (Gradient-based Reward Policy Optimization) method.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model's training with GRPO is based on the methodology introduced in the DeepSeekMath paper, aiming to improve its ability to handle mathematical reasoning tasks.
  • Instruction Following: As an instruction-tuned model, it is designed to follow user prompts and instructions effectively.
  • Extended Context Window: It supports a substantial context length of 32768 tokens, allowing for processing longer inputs and generating more coherent, extended responses.

Training Details

The model was trained using TRL version 0.15.2, with Transformers 4.48.2, PyTorch 2.5.1, Datasets 3.6.0, and Tokenizers 0.21.1. The GRPO method, detailed in the DeepSeekMath research paper, was a core component of its training procedure.

Good For

  • Applications requiring a compact model with improved mathematical reasoning.
  • Instruction-following tasks where a longer context window is beneficial.
  • Exploration of models fine-tuned with advanced reinforcement learning techniques like GRPO.