Thrillcrazyer/Qwen-7B_TAC_GSPO

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jan 6, 2026Architecture:Transformer Cold

Thrillcrazyer/Qwen-7B_TAC_GSPO is a 7.6 billion parameter language model fine-tuned from Qwen/Qwen2.5-7B-Instruct. It specializes in mathematical reasoning, having been trained on the DeepMath-103k dataset using the GRPO method. This model is optimized for complex mathematical problem-solving and advanced reasoning tasks, leveraging its 131072 token context length.

Loading preview...

Overview

Thrillcrazyer/Qwen-7B_TAC_GSPO is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base model. Its primary distinction lies in its specialized training for mathematical reasoning, utilizing the DeepMath-103k dataset.

Key Capabilities

  • Enhanced Mathematical Reasoning: The model was trained using the GRPO (Guided Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, to significantly improve its ability to understand and solve complex mathematical problems.
  • Instruction Following: Inherits strong instruction-following capabilities from its Qwen2.5-7B-Instruct base.
  • Large Context Window: Features a substantial 131072 token context length, allowing for processing and reasoning over extensive inputs.

Training Details

The model's fine-tuning process leveraged the TRL framework. The GRPO method, central to its training, is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Good For

  • Applications requiring advanced mathematical problem-solving.
  • Research and development in AI for mathematical reasoning.
  • Tasks that benefit from a model specifically optimized for numerical and logical deduction.