The neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4 is an 8 billion parameter Llama-3.1-based model developed by Neural Magic, optimized with 2:4 sparsity. This model is specifically fine-tuned for grade-school mathematical reasoning tasks, achieving 66.9% 0-shot accuracy on the GSM8k benchmark. It demonstrates strong performance in math problem-solving while maintaining sparsity benefits, making it suitable for efficient deployment in math-intensive applications.
Loading preview...
Model Overview
The neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4 is an 8 billion parameter language model built on the Llama-3.1 architecture, developed by Neural Magic. This model is distinguished by its 2:4 sparsity optimization, meaning that in each group of four weights within transformer blocks, two are retained and two are pruned. It is specifically fine-tuned on the GSM8k dataset to excel at grade-school mathematical reasoning.
Key Capabilities & Performance
- Specialized Math Reasoning: Achieves 66.9% 0-shot accuracy on the GSM8k benchmark, outperforming its dense counterpart,
Llama-3.1-8B-gsm8k, which scores 66.3%. - Sparsity with Accuracy Recovery: Demonstrates over 100% accuracy recovery compared to the dense fine-tuned model, indicating that the sparsity optimization does not compromise performance in its specialized domain.
- Efficient Deployment: Inherits the 2:4 sparsity pattern from its parent model,
Sparse-Llama-3.1-8B-2of4, making it suitable for efficient inference, particularly when deployed with backends like vLLM.
Good For
- Mathematical Problem Solving: Ideal for applications requiring accurate solutions to grade-school level math problems.
- Resource-Efficient AI: Suitable for scenarios where computational efficiency and reduced memory footprint are critical, without significant loss in specialized task performance.
- Benchmarking Sparse Models: Useful for researchers and developers exploring the efficacy of sparsity techniques in specialized LLMs.