TMLR-Group-HF/GT-Qwen3-4B-Base-DAPO14k
TEXT GENERATIONConcurrency Cost:1Model Size:4BQuant:BF16Ctx Length:32kPublished:Oct 3, 2025License:mitArchitecture:Transformer Open Weights Warm

GT-GRPO's GT-Qwen3-4B-Base-DAPO14k is a 4 billion parameter Qwen3-Base model, fine-tuned using the DAPO-14k dataset and the novel Co-rewarding self-supervised reinforcement learning framework. This model is specifically optimized to enhance reasoning abilities in large language models, particularly for complex mathematical tasks. It leverages a context length of 40960 tokens and aims to improve training stability and performance in reasoning benchmarks without relying on extensive human-annotated labels.

Loading preview...