coinex/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-aquatic_armored_okapi

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 5, 2025Architecture:Transformer Warm

The coinex/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-aquatic_armored_okapi model is a fine-tuned version of the Gensyn/Qwen2.5-0.5B-Instruct architecture, developed by coinex. This 0.5 billion parameter instruction-tuned model was trained using the TRL framework and incorporates the GRPO method from DeepSeekMath. It is specifically optimized for enhancing mathematical reasoning capabilities in open language models.

Loading preview...

Model Overview

The coinex/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-aquatic_armored_okapi is an instruction-tuned language model based on the Gensyn/Qwen2.5-0.5B-Instruct architecture. This model has been further fine-tuned using the TRL (Transformer Reinforcement Learning) framework.

Key Training Details

  • Base Model: Gensyn/Qwen2.5-0.5B-Instruct
  • Fine-tuning Framework: TRL (version 0.15.2)
  • Training Method: Incorporates GRPO (Gradient Regularized Policy Optimization), a method introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests a focus on improving mathematical reasoning abilities.

Intended Use Cases

This model is particularly suitable for applications requiring:

  • Instruction following: As an instruction-tuned model, it is designed to respond effectively to user prompts.
  • Mathematical reasoning tasks: The integration of the GRPO method indicates an optimization for handling mathematical problems and logical reasoning.
  • Lightweight deployments: With 0.5 billion parameters, it offers a balance between performance and computational efficiency, making it suitable for environments with limited resources.