Kazuki1450/Qwen3-0.6B_nseq_4_8_clean_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Mar 16, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-0.6B_nseq_4_8_clean_1p0_0p0_1p0_grpo_42_rule is a fine-tuned 0.8 billion parameter language model based on the Qwen3-0.6B architecture. This model was trained using the TRL framework and incorporates the GRPO method, which is designed to enhance mathematical reasoning capabilities. It is specifically optimized for tasks requiring advanced mathematical problem-solving, leveraging techniques from the DeepSeekMath research. This model is suitable for applications where robust mathematical reasoning is a primary requirement.

Loading preview...