xw1234gan/Extended_GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Mar 25, 2026Architecture:Transformer Warm

The xw1234gan/Extended_GRPO_KL_Qwen2.5-3B-Instruct_MATH_beta0.01_lr1e-05_mb2_ga128_n2048_seed42 is a 3.1 billion parameter instruction-tuned language model based on the Qwen2.5 architecture. This model is specifically fine-tuned for mathematical reasoning and problem-solving tasks, leveraging an extended GRPO-KL training approach. With a context length of 32768 tokens, it aims to provide enhanced performance in complex quantitative domains. It is designed for applications requiring robust mathematical understanding and generation.

Loading preview...