Overview
Overview
mobiuslabsgmbh/DeepSeek-R1-ReDistill-Llama3-8B-v1.1 is an enhanced, re-distilled version of the original DeepSeek-R1-Distill-Llama3-8B model. This 8-billion parameter model focuses on improving general performance and reasoning capabilities through a re-distillation process.
Key Performance Improvements
This model shows significant gains over its predecessor across several key benchmarks:
- MMLU (5-shot): Improved from 56.87 to 58.78.
- TruthfulQA-MC2: Increased from 50.53 to 51.94.
- Winogrande (5-shot): Rose from 68.11 to 70.25.
- GSM8K (5-shot): A substantial jump from 61.79 to 75.66, indicating stronger mathematical reasoning.
- GPQA (0-shot): Improved from 29 to 33.98.
- BBH (3-shot): Enhanced from 41.57 to 49.59.
Usage and Optimization
The model can be easily integrated using the Hugging Face transformers library. It also supports optimization with the HQQ library for faster inference, potentially running up to 3.5x quicker with quantization techniques like torchao_int4.
Good For
- Applications requiring improved general reasoning and knowledge understanding.
- Tasks benefiting from enhanced mathematical problem-solving, as evidenced by its strong GSM8K score.
- Developers looking for an 8B parameter model with competitive benchmark performance and optimization options for faster inference.