Name: s3171103/DeepSeek-R1-Distill-Qwen-14B-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: s3171103

Model Overview

This model, s3171103/DeepSeek-R1-Distill-Qwen-14B-GRPO, is a 14 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-14B base model. It has been fine-tuned using the TRL framework, specifically incorporating the GRPO (Generalized Reinforcement Learning with Policy Optimization) method.

Key Differentiator: GRPO Fine-tuning

The primary distinction of this model lies in its application of GRPO, a technique detailed in the research paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models". This fine-tuning approach is designed to significantly improve the model's performance in areas requiring:

Mathematical Reasoning: Enhanced ability to understand and solve complex mathematical problems.
Logical Deduction: Improved capacity for structured thinking and inference.
Problem-Solving: Better performance on tasks that demand multi-step reasoning.

Technical Specifications

Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
Parameter Count: 14 billion
Context Length: 32768 tokens
Training Frameworks: TRL (version 0.18.0.dev0), Transformers (version 4.52.0.dev0), Pytorch (version 2.6.0), Datasets (version 3.6.0), Tokenizers (version 0.21.1).

Use Cases

This model is particularly well-suited for applications where robust mathematical and logical reasoning capabilities are crucial. Developers can leverage it for tasks such as:

Generating solutions to mathematical queries.
Assisting in scientific research requiring complex calculations.
Developing intelligent agents that need to perform multi-step logical deductions.

Overview

Model Overview

Key Differentiator: GRPO Fine-tuning

Technical Specifications

Use Cases

Full Model Card (README)