Name: pawin205/Qwen-7B-REMOR-GRPO-no-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: pawin205

Model Overview

pawin205/Qwen-7B-REMOR-GRPO-no-SFT is a 7.6 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B base model. It has been specifically fine-tuned using the TRL framework.

Key Differentiator: GRPO Training

The primary distinction of this model lies in its training methodology. It leverages GRPO (Generative Reinforcement learning with Policy Optimization), a technique introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This method is designed to significantly improve the model's proficiency in mathematical reasoning tasks.

Training Details

Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Training Framework: TRL (Transformer Reinforcement Learning)
Methodology: GRPO, focused on enhancing mathematical reasoning.

Use Cases

This model is particularly well-suited for applications requiring strong mathematical problem-solving and logical reasoning. Developers can utilize it for tasks where accurate numerical and logical deductions are critical, benefiting from its specialized GRPO training.

Overview

Model Overview

Key Differentiator: GRPO Training

Training Details

Use Cases

Full Model Card (README)