Name: zhaohq/GRPO-7B-fmt03-math API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Overview

zhaohq/GRPO-7B-fmt03-math is a 7.6 billion parameter language model, fine-tuned from the Qwen/Qwen2.5-Math-7B base model. This model leverages the GRPO (Gradient-based Reward Policy Optimization) training method, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". The fine-tuning was performed using the TRL framework.

Key Capabilities

Enhanced Mathematical Reasoning: Specifically optimized for handling complex mathematical problems and reasoning tasks, building upon the strong foundation of Qwen2.5-Math-7B.
Large Context Window: Supports a context length of 32768 tokens, allowing for processing and understanding extensive mathematical problems or related textual information.
GRPO Training: Benefits from a specialized training approach designed to improve performance in mathematical domains, as outlined in the DeepSeekMath paper.

Good for

Applications requiring robust mathematical problem-solving.
Research and development in AI for advanced numerical reasoning.
Tasks that benefit from a model specifically trained to excel in mathematical contexts.

Overview

Overview

Key Capabilities

Good for

Full Model Card (README)