Gleb1983/DeepSeek-R1-Distill-Llama-70B

TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:32kPublished:Apr 13, 2026License:mitArchitecture:Transformer Open Weights Cold

Gleb1983/DeepSeek-R1-Distill-Llama-70B is a 70 billion parameter language model developed by DeepSeek-AI, distilled from the DeepSeek-R1 reasoning model and based on Llama-3.3-70B-Instruct. This model is specifically fine-tuned using reasoning patterns generated by the larger DeepSeek-R1, aiming to transfer advanced reasoning capabilities to a smaller, dense architecture. It demonstrates strong performance across math, code, and general reasoning benchmarks, making it suitable for applications requiring robust analytical problem-solving.

Loading preview...

DeepSeek-R1-Distill-Llama-70B: Reasoning Distilled

This model is a 70 billion parameter variant from the DeepSeek-R1-Distill series, developed by DeepSeek-AI. It is based on the Llama-3.3-70B-Instruct architecture and has been fine-tuned using reasoning data generated by the larger DeepSeek-R1 model. The core idea behind the DeepSeek-R1-Distill models is to effectively transfer the complex reasoning patterns discovered through large-scale reinforcement learning (RL) in the DeepSeek-R1 into more compact, dense models.

Key Capabilities

  • Enhanced Reasoning: Benefits from distillation of advanced reasoning patterns, particularly strong in mathematical and coding tasks.
  • Strong Benchmark Performance: Achieves competitive results on benchmarks such as AIME 2024 (70.0 pass@1), MATH-500 (94.5 pass@1), GPQA Diamond (65.2 pass@1), and LiveCodeBench (57.5 pass@1).
  • Llama-Based Architecture: Leverages the widely adopted Llama architecture for broad compatibility and deployment.

Good For

  • Complex Problem Solving: Ideal for applications requiring robust analytical and step-by-step reasoning.
  • Mathematical and Coding Tasks: Excels in domains like competitive programming and advanced mathematics.
  • Research and Development: Provides a powerful, distilled model for further research into reasoning capabilities and efficient deployment.