Nitish-Garikoti/DeepSeek-R1-Distill-Llama-8B
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Mar 29, 2026License:mitArchitecture:Transformer Open Weights Cold

DeepSeek-R1-Distill-Llama-8B is an 8 billion parameter language model developed by DeepSeek AI, distilled from the larger DeepSeek-R1 model and based on Llama-3.1-8B. It features a 32,768 token context length and is specifically optimized for reasoning tasks across math, code, and general problem-solving. This model demonstrates that advanced reasoning capabilities can be effectively transferred to smaller, dense architectures through distillation.

Loading preview...