Thrillcrazyer/Qwen-7B_TAC_GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 7, 2026Architecture:Transformer Cold

Thrillcrazyer/Qwen-7B_TAC_GRPO is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. This model was trained using the GRPO method, as introduced in the DeepSeekMath paper, to enhance mathematical reasoning capabilities. It leverages the TRL framework for its training procedure. The model is designed for tasks requiring advanced reasoning, particularly in mathematical contexts.

Loading preview...