Name: sunmengjie/DeepSeek-R1-Distill-Qwen-1.5B-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: sunmengjie

Model Overview

This model, sunmengjie/DeepSeek-R1-Distill-Qwen-1.5B-GRPO, is a 1.5 billion parameter language model fine-tuned from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. It was trained using the TRL library and incorporates the GRPO (Generalized Reinforcement Learning from Policy Optimization) method.

Key Capabilities & Training

Mathematical Reasoning: The primary differentiator of this model is its fine-tuning with GRPO, a method detailed in the paper "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models" (arXiv:2402.03300). This suggests an enhanced focus on and proficiency in mathematical problem-solving.
Efficient Fine-tuning: Built upon a 1.5B parameter model, it offers a relatively compact size while aiming for specialized performance.
Frameworks Used: The training utilized TRL (version 0.23.0), Transformers (version 4.57.0), PyTorch (version 2.6.0+cu124), Datasets (version 4.1.1), and Tokenizers (version 0.22.1).

Use Cases

This model is particularly well-suited for applications requiring:

Mathematical problem-solving: Leveraging its GRPO-enhanced training for complex calculations and logical reasoning in mathematical contexts.
Research and development: As a base for further experimentation or fine-tuning on specific mathematical or reasoning-intensive datasets.
Educational tools: Potentially assisting in generating explanations or solutions for mathematical queries.

Overview

Model Overview

Key Capabilities & Training

Use Cases

Full Model Card (README)