Name: Thrillcrazyer/Qwen-7B_TAC_GSPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Thrillcrazyer

Overview

Thrillcrazyer/Qwen-7B_TAC_GSPO is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-7B-Instruct base model. Its primary distinction lies in its specialized training for mathematical reasoning, utilizing the DeepMath-103k dataset.

Key Capabilities

Enhanced Mathematical Reasoning: The model was trained using the GRPO (Guided Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper, to significantly improve its ability to understand and solve complex mathematical problems.
Instruction Following: Inherits strong instruction-following capabilities from its Qwen2.5-7B-Instruct base.
Large Context Window: Features a substantial 131072 token context length, allowing for processing and reasoning over extensive inputs.

Training Details

The model's fine-tuning process leveraged the TRL framework. The GRPO method, central to its training, is detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Good For

Applications requiring advanced mathematical problem-solving.
Research and development in AI for mathematical reasoning.
Tasks that benefit from a model specifically optimized for numerical and logical deduction.

Overview

Overview

Key Capabilities

Training Details

Good For

Full Model Card (README)