TMLR-Group-HF/GT-Qwen3-4B-Base-MATH
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Aug 5, 2025License:mitArchitecture:Transformer Open Weights Warm

TMLR-Group-HF/GT-Qwen3-4B-Base-MATH is a 4 billion parameter Qwen3-Base model developed by TMLR-Group-HF, specifically trained using the Ground Truth (GRPO) method on a MATH dataset. This model is optimized for mathematical reasoning tasks, leveraging a novel Co-rewarding self-supervised reinforcement learning framework. It aims to enhance reasoning capabilities in large language models by addressing stability and scaling challenges in self-rewarding methods.

Loading preview...