Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 20, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model utilizes the GRPO (Gradient-based Reward Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its capabilities. It is specifically optimized for tasks requiring advanced reasoning, particularly in mathematical contexts, leveraging its 32K token context length.

Loading preview...