yuerxin/DeepSeek-R1-Distill-Qwen-1.5B
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Sep 22, 2025License:mitArchitecture:Transformer Open Weights Warm

DeepSeek-R1-Distill-Qwen-1.5B is a fine-tuned language model based on the deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B architecture. This model was trained with a learning rate of 1e-05 over 3 epochs, utilizing a cosine learning rate scheduler. While specific capabilities and intended uses are not detailed, its base model is a 1.5 billion parameter distilled version of DeepSeek-R1, suggesting a focus on efficient performance for general language tasks.

Loading preview...