Overview
This model, named 000ADI/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-muscular_miniature_kiwi, is a 0.5 billion parameter instruction-tuned language model. It is a fine-tuned variant of the Gensyn/Qwen2.5-0.5B-Instruct base model, developed using the TRL framework.
Key Training Details
The model's unique characteristic lies in its training methodology: it was trained with GRPO (Gradient-based Reward Policy Optimization). This method, introduced in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models," aims to significantly improve the model's mathematical reasoning abilities.
Use Cases
Given its specialized training with GRPO, this model is particularly well-suited for:
- Tasks requiring enhanced mathematical problem-solving.
- Applications where robust reasoning in quantitative domains is crucial.
Quick Start Example
Developers can quickly integrate and test the model using the Hugging Face pipeline for text generation:
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="000ADI/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-muscular_miniature_kiwi", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])