Name: leonMW/DeepSeek-R1-Distill-Qwen-7B-GSPO-Basic API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: leonMW

Model Overview

This model, leonMW/DeepSeek-R1-Distill-Qwen-7B-GSPO-Basic, is a fine-tuned variant of the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B base model. It incorporates the GRPO (Generalized Reinforcement Learning from Policy Optimization) training method, which was originally introduced in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300).

Key Capabilities

Enhanced Reasoning: Leverages the GRPO training procedure to potentially improve reasoning abilities, particularly in areas related to mathematical problem-solving, as suggested by the method's origin.
Instruction Following: As a fine-tuned model, it is designed to follow instructions effectively, making it suitable for various conversational and task-oriented applications.
Large Context Window: Maintains the base model's substantial context length of 32768 tokens, allowing it to process and generate longer, more coherent responses.

Training Details

The model was trained using the TRL library (version 0.23.1) and built upon the DeepSeek-R1-Distill-Qwen-7B architecture. The application of GRPO aims to refine its performance beyond the base model's capabilities, focusing on robust and accurate outputs.

Overview

Model Overview

Key Capabilities

Training Details

Full Model Card (README)