Name: pawin205/Qwen-7B-REMOR-GRPO-no-think API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: pawin205

Model Overview

The pawin205/Qwen-7B-REMOR-GRPO-no-think is a 7.6 billion parameter language model, building upon the pawin205/Qwen-7B-REMOR-SFT-no-think base. This model has undergone further fine-tuning using the GRPO (Generative Reinforcement Learning with Policy Optimization) method, as detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models".

Key Capabilities

Enhanced Mathematical Reasoning: The primary differentiator of this model is its optimization for complex mathematical reasoning tasks, achieved through the GRPO training procedure.
Fine-tuned Performance: Leverages the TRL (Transformer Reinforcement Learning) library for its training, indicating a focus on refining model behavior and output quality.
Qwen-7B Base: Inherits the foundational capabilities of the Qwen-7B architecture, providing a robust base for its specialized mathematical reasoning.

Training Details

The model was trained using TRL version 0.24.0, with Transformers 4.57.1 and Pytorch 2.8.0+cu129. The GRPO method, central to its training, aims to improve the model's ability to generate accurate and logical mathematical solutions. This makes it particularly well-suited for use cases requiring precise numerical and logical problem-solving, distinguishing it from general-purpose language models.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)