Kazuki1450/Qwen3-1.7B-Base_csum_3_10_1p0_0p1_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_3_10_1p0_0p1_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model was trained using the GRPO (Gradient-based Reward Policy Optimization) method, which is known for enhancing mathematical reasoning capabilities in large language models. It is specifically optimized for tasks requiring improved reasoning, particularly in mathematical contexts, leveraging its 32K token context length.

Loading preview...