Shreyansh327/Qwen3-1.7B-grpo-gsm8k
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 2, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Shreyansh327/Qwen3-1.7B-grpo-gsm8k is a 2 billion parameter Qwen3-1.7B model fine-tuned by Shreyansh327 using Group Relative Policy Optimization (GRPO) on the GSM8K dataset. This model specializes in mathematical reasoning, generating structured chain-of-thought explanations within tags before providing a final answer. It is optimized for grade-school level math problems, focusing on accuracy and structured reasoning.

Loading preview...