yujiangw/Qwen3-1.7B-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jul 1, 2025Architecture:Transformer Warm

The yujiangw/Qwen3-1.7B-GRPO model is a 1.7 billion parameter language model fine-tuned using the GRPO (Gradient-based Reward Policy Optimization) method. This model is based on the Qwen3 architecture and is specifically optimized for tasks requiring advanced reasoning, as indicated by its training method derived from DeepSeekMath research. It is designed to provide enhanced performance in complex problem-solving scenarios, particularly those involving mathematical reasoning.

Loading preview...