sunmengjie/DeepSeek-R1-Distill-Qwen-1.5B-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Oct 9, 2025Architecture:Transformer Warm

The sunmengjie/DeepSeek-R1-Distill-Qwen-1.5B-GRPO is a 1.5 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model leverages the GRPO (Generalized Reinforcement Learning from Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its capabilities. It is specifically optimized for mathematical reasoning tasks, building upon its base model's foundation. The model is designed for applications requiring robust mathematical problem-solving and reasoning.

Loading preview...

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p