mobiuslabsgmbh/DeepSeek-R1-ReDistill-Llama3-8B-v1.1

Warm
Public
8B
FP8
32768
1
Jan 29, 2025
License: mit
Hugging Face
Overview

Overview

mobiuslabsgmbh/DeepSeek-R1-ReDistill-Llama3-8B-v1.1 is an enhanced, re-distilled version of the original DeepSeek-R1-Distill-Llama3-8B model. This 8-billion parameter model focuses on improving general performance and reasoning capabilities through a re-distillation process.

Key Performance Improvements

This model shows significant gains over its predecessor across several key benchmarks:

  • MMLU (5-shot): Improved from 56.87 to 58.78.
  • TruthfulQA-MC2: Increased from 50.53 to 51.94.
  • Winogrande (5-shot): Rose from 68.11 to 70.25.
  • GSM8K (5-shot): A substantial jump from 61.79 to 75.66, indicating stronger mathematical reasoning.
  • GPQA (0-shot): Improved from 29 to 33.98.
  • BBH (3-shot): Enhanced from 41.57 to 49.59.

Usage and Optimization

The model can be easily integrated using the Hugging Face transformers library. It also supports optimization with the HQQ library for faster inference, potentially running up to 3.5x quicker with quantization techniques like torchao_int4.

Good For

  • Applications requiring improved general reasoning and knowledge understanding.
  • Tasks benefiting from enhanced mathematical problem-solving, as evidenced by its strong GSM8K score.
  • Developers looking for an 8B parameter model with competitive benchmark performance and optimization options for faster inference.