Name: zhaohq/GRPO-7B-ls-v1-fullepoch-hotpot API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zhaohq

Model Overview

This model, zhaohq/GRPO-7B-ls-v1-fullepoch-hotpot, is a 7.6 billion parameter language model built upon the Qwen/Qwen2.5-7B architecture. It distinguishes itself through its training methodology, utilizing GRPO (Gradient Regularized Policy Optimization), a technique detailed in the DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models paper. This approach aims to significantly improve the model's reasoning abilities, particularly in mathematical contexts.

Key Capabilities

Enhanced Reasoning: Fine-tuned with GRPO to boost performance on complex reasoning tasks.
Mathematical Proficiency: Optimized for mathematical problem-solving, drawing from the DeepSeekMath research.
Large Context Window: Features a substantial 32768 token context length, enabling processing of extensive inputs for detailed analysis.

Training Details

The model was fine-tuned using the TRL library (version 0.16.0.dev0) and leverages Transformers 4.48.3. The training process is publicly logged and viewable on Weights & Biases.

Use Cases

Mathematical Problem Solving: Ideal for applications requiring robust mathematical reasoning and calculation.
Complex Logical Inference: Suitable for tasks that benefit from advanced logical deduction over long contexts.
Research and Development: A strong candidate for further research into reasoning capabilities of large language models.

Overview

Model Overview

Key Capabilities

Training Details

Use Cases

Full Model Card (README)