sunmengjie/DeepSeek-R1-Distill-Qwen-1.5B-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Oct 9, 2025Architecture:Transformer Warm
The sunmengjie/DeepSeek-R1-Distill-Qwen-1.5B-GRPO is a 1.5 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. This model leverages the GRPO (Generalized Reinforcement Learning from Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its capabilities. It is specifically optimized for mathematical reasoning tasks, building upon its base model's foundation. The model is designed for applications requiring robust mathematical problem-solving and reasoning.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
–
top_p
–
top_k
–
frequency_penalty
–
presence_penalty
–
repetition_penalty
–
min_p
–