Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p2_1p0_grpo_sapo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 23, 2026Architecture:Transformer Warm
Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_1p0_0p2_1p0_grpo_sapo_42_rule is a 2 billion parameter language model fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO method, which is designed to enhance mathematical reasoning capabilities in open language models. It leverages the TRL framework for its training procedure, making it suitable for tasks requiring improved logical and mathematical understanding. The model is optimized for applications where robust reasoning is crucial.
Loading preview...