maheshrawat18/Qwen3-4B-GRPO-v5-merged
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Mar 7, 2026License:apache-2.0Architecture:Transformer Open Weights Warm
The maheshrawat18/Qwen3-4B-GRPO-v5-merged is a 4 billion parameter Qwen3 model developed by maheshrawat18, fine-tuned from maheshrawat18/Qwen3-4B-Thinking-2507-merged. This model was trained with Unsloth and Huggingface's TRL library, achieving 2x faster training speeds. It is designed for general language tasks, leveraging its Qwen3 architecture and efficient training methodology.
Loading preview...