unsloth/DeepSeek-R1-0528-Qwen3-8B
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:May 29, 2025License:mitArchitecture:Transformer0.0K Open Weights Warm
DeepSeek-R1-0528-Qwen3-8B is an 8 billion parameter language model developed by DeepSeek AI, based on the Qwen3 architecture with a 32768 token context length. This model is distilled from the DeepSeek-R1-0528 model's chain-of-thought, significantly enhancing its reasoning capabilities, particularly in mathematics and programming. It achieves state-of-the-art performance among open-source models on benchmarks like AIME 2024, making it suitable for complex reasoning tasks and code generation.
Loading preview...
Popular Sampler Settings
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.
temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p