Name: zhaohq/GSPO-7B-v5-main API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

The zhaohq/GSPO-7B-v5-main is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-7B base model. It utilizes a substantial 32768 token context length, making it suitable for processing lengthy and complex inputs.

Key Capabilities

Enhanced Mathematical Reasoning: This model was specifically trained using the GRPO (Gradient-based Reward Policy Optimization) method, a technique detailed in the DeepSeekMath paper. This training approach aims to significantly improve the model's ability to handle mathematical problems and logical reasoning tasks.
Fine-tuned with TRL: The model's fine-tuning process leveraged the TRL library, a framework for Transformer Reinforcement Learning, indicating a focus on optimizing performance through advanced training methodologies.

Good For

Applications requiring strong mathematical problem-solving.
Tasks that benefit from advanced logical reasoning capabilities.
Research and development in areas involving complex numerical or symbolic manipulation.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)