ertghiu256/Qwen-3-1.7b-deepseek-r1-0528-distillation
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Jul 22, 2025Architecture:Transformer0.0K Warm

The ertghiu256/Qwen-3-1.7b-deepseek-r1-0528-distillation is a 2 billion parameter language model based on the Qwen 3 architecture. This model has been specifically trained using a distillation process on the DeepSeek R1 0528 dataset. It features an extended context length of 40960 tokens, making it suitable for tasks requiring extensive contextual understanding.

Loading preview...