SoheylM/DeepSeek-R1-Distill-Qwen-14B-GRPO
TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kArchitecture:Transformer0.0K Cold

SoheylM/DeepSeek-R1-Distill-Qwen-14B-GRPO is a 14 billion parameter language model, fine-tuned from deepseek-ai/DeepSeek-R1-Distill-Qwen-14B. It was trained using the GRPO method on the IDEALLab/OpenR1-EPS-5k dataset, which is designed to enhance mathematical reasoning capabilities. This model is specifically optimized for tasks requiring advanced mathematical problem-solving and logical deduction, leveraging a 32768 token context length.

Loading preview...