Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_rel_1e-2_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 24, 2026Architecture:Transformer Warm

Kazuki1450/Qwen3-1.7B-Base_dsum_3_6_rel_1e-2_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base using the TRL framework. This model incorporates the GRPO (Gradient Regularized Policy Optimization) method, which is specifically designed to enhance mathematical reasoning capabilities. It is optimized for tasks requiring robust logical and mathematical problem-solving, making it suitable for applications in scientific computing and data analysis.

Loading preview...