Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p5_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 26, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_0p5_0p0_1p0_grpo_42_rule is a 2 billion parameter causal language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring improved logical and mathematical problem-solving, leveraging its 32768 token context length.

Loading preview...