Name: sravanthib/Qwen-2.5-7B-Simple-RL API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sravanthib

Overview

This model, sravanthib/Qwen-2.5-7B-Simple-RL, is a 7.6 billion parameter language model built upon the Qwen/Qwen2.5-Math-7B base. It has been fine-tuned using the TRL (Transformer Reinforcement Learning) framework, specifically employing the GRPO (Generalized Reinforcement Learning with Policy Optimization) method. GRPO is a technique highlighted in the DeepSeekMath research, which focuses on enhancing mathematical reasoning capabilities in large language models.

Key Capabilities

Enhanced Mathematical Reasoning: Fine-tuned with GRPO, suggesting improved performance on complex mathematical problems and logical deduction.
Large Context Window: Supports a substantial context length of 131,072 tokens, allowing for processing and generating extensive text sequences.
Reinforcement Learning Fine-tuning: Leverages advanced RL techniques for potentially more aligned and coherent outputs.

Training Details

The model's training procedure involved GRPO, a method detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). The training utilized TRL version 0.16.0.dev0, with Transformers 4.49.0 and PyTorch 2.5.1.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)