Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_dollars_1p0_0p0_1p0_grpo_42_rule
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 18, 2026Architecture:Transformer Cold

Kazuki1450/Qwen3-1.7B-Base_csum_3_10_tok_dollars_1p0_0p0_1p0_grpo_42_rule is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base with a 32768 token context length. This model utilizes the GRPO (Gradient-based Reward Policy Optimization) method, known for enhancing mathematical reasoning in language models. It is specifically optimized for tasks requiring improved reasoning capabilities, building upon the base Qwen3 architecture.

Loading preview...