Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_sapo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 23, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p5_1p0_grpo_sapo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base, utilizing the TRL framework. This model incorporates the GRPO (Gradient-based Reward Policy Optimization) method, as detailed in the DeepSeekMath paper, suggesting an optimization for reasoning and mathematical tasks. With a context length of 32768 tokens, it is designed for applications requiring enhanced logical processing, particularly in areas where mathematical reasoning is critical.

Loading preview...