Thrillcrazyer/QWEN7_GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 27, 2025Architecture:Transformer Cold

Thrillcrazyer/QWEN7_GRPO is a 7.6 billion parameter instruction-tuned causal language model, fine-tuned from Qwen/Qwen2.5-7B-Instruct. It was specifically trained on the DeepMath-103k dataset using the GRPO method, which is designed to enhance mathematical reasoning capabilities. This model excels at complex mathematical problem-solving and logical deduction, making it suitable for applications requiring advanced quantitative understanding.

Loading preview...