Name: yuerxin/DeepSeek-R1-Distill-Qwen-1.5B-GRPO API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: yuerxin

Model Overview

Sayram/DeepSeek-R1-Distill-Qwen-1.5B-GRPO is a 1.5 billion parameter language model derived from the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B base model. Its primary distinction lies in its specialized training for mathematical reasoning, utilizing the open-r1/OpenR1-Math-220k dataset.

Key Capabilities

Enhanced Mathematical Reasoning: The model was fine-tuned using the GRPO (Guided Reasoning Policy Optimization) method, as introduced in the DeepSeekMath paper. This method is designed to push the limits of mathematical problem-solving in language models.
Specialized Training: Its training on a dedicated mathematical dataset makes it particularly adept at understanding and generating responses for quantitative problems.
High Context Length: Features a substantial context window of 131072 tokens, allowing it to process and reason over lengthy mathematical descriptions and complex problem statements.

When to Use This Model

This model is particularly well-suited for applications requiring strong mathematical reasoning capabilities, such as:

Solving complex math problems.
Generating explanations for mathematical concepts.
Assisting in educational tools focused on mathematics.
Any use case where robust quantitative analysis and logical deduction are paramount.