s3171103/DeepSeek-R1-Distill-Qwen-14B-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kArchitecture:Transformer Cold

s3171103/DeepSeek-R1-Distill-Qwen-14B-GRPO is a 14 billion parameter language model fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-14B. This model leverages the GRPO (Generalized Reinforcement Learning with Policy Optimization) method, as introduced in the DeepSeekMath paper, to enhance its capabilities. It is specifically optimized for tasks requiring advanced mathematical reasoning and complex problem-solving, building upon its base model's foundation. The model has a context length of 32768 tokens, making it suitable for processing extensive inputs.

Loading preview...