tuteeee/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-carnivorous_pensive_salmon

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 12, 2025Architecture:Transformer Warm

tuteeee/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-carnivorous_pensive_salmon is a 0.5 billion parameter instruction-tuned language model, fine-tuned from Gensyn/Qwen2.5-0.5B-Instruct. It was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities, as introduced in the DeepSeekMath paper. This model maintains a context length of 32768 tokens and is primarily suited for tasks requiring improved reasoning, particularly in mathematical contexts.

Loading preview...

Overview

This model, tuteeee/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-carnivorous_pensive_salmon, is a 0.5 billion parameter instruction-tuned variant of the Gensyn/Qwen2.5-0.5B-Instruct base model. It has been specifically fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method, a technique highlighted in the DeepSeekMath paper for its effectiveness in pushing the limits of mathematical reasoning in open language models. The training was conducted using the TRL framework.

Key Capabilities

  • Enhanced Mathematical Reasoning: Leverages the GRPO training method to improve performance on tasks requiring mathematical reasoning.
  • Instruction Following: Designed to follow instructions effectively due to its instruction-tuned nature.
  • Compact Size: At 0.5 billion parameters, it offers a smaller footprint while aiming for specialized reasoning improvements.
  • Extended Context Window: Supports a context length of 32768 tokens, allowing for processing longer inputs.

Good for

  • Applications requiring a compact model with improved mathematical reasoning abilities.
  • Tasks where instruction following is crucial and a smaller model size is advantageous.
  • Research and experimentation with GRPO-trained models for specialized reasoning tasks.