starfrich/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-amphibious_leaping_bison
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm

The starfrich/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-amphibious_leaping_bison is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned from unsloth/Qwen2.5-0.5B-Instruct. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, which focuses on enhancing mathematical reasoning. It is optimized for tasks requiring improved reasoning capabilities, particularly in mathematical contexts, making it suitable for specialized applications where precise logical processing is crucial.

Loading preview...

Model Overview

This model, starfrich/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-amphibious_leaping_bison, is a 0.5 billion parameter instruction-tuned language model. It is a fine-tuned variant of the unsloth/Qwen2.5-0.5B-Instruct base model, developed by starfrich.

Key Differentiator: GRPO Training

A significant aspect of this model is its training methodology. It was fine-tuned using GRPO (Gradient Regularized Policy Optimization), a method detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This indicates a specialized focus on improving the model's mathematical reasoning and logical processing capabilities.

Technical Details

  • Base Model: unsloth/Qwen2.5-0.5B-Instruct
  • Parameter Count: 0.5 Billion
  • Training Framework: TRL (Transformer Reinforcement Learning)
  • Context Length: 131072 tokens

Use Cases

Given its GRPO-based training, this model is particularly well-suited for:

  • Mathematical problem-solving: Tasks requiring logical deduction and numerical reasoning.
  • Instruction following: Benefiting from its instruction-tuned nature.
  • Applications where enhanced reasoning is critical: Especially in domains that can leverage the GRPO method's strengths.

Quick Start Example

Users can quickly integrate and test the model using the transformers library:

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="starfrich/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-amphibious_leaping_bison", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])