emre/Qwen-0.5B-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kArchitecture:Transformer Warm

emre/Qwen-0.5B-GRPO is a 0.5 billion parameter Qwen2.5-0.5B-Instruct model fine-tuned by Davut Emre Taşar using Generative Reward Policy Optimization (GRPO). It specializes in generating structured, step-by-step reasoning for math problems from the GSM8K dataset, outputting explicit and sections. This model is optimized for lightweight math reasoning assistance and educational applications, leveraging BF16 training and vLLM for efficient inference.

Loading preview...