haedahae/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-fierce_squeaky_cheetah
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Apr 13, 2025Architecture:Transformer Cold

The haedahae/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-fierce_squeaky_cheetah model is a 0.5 billion parameter instruction-tuned language model, fine-tuned from Gensyn/Qwen2.5-0.5B-Instruct. It was trained using the TRL framework and incorporates GRPO (Gradient-based Reward Policy Optimization), a method designed to enhance mathematical reasoning. This model is specialized for tasks requiring improved reasoning capabilities, particularly in mathematical contexts, leveraging its fine-tuning approach.

Loading preview...