Name: neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: neuralmagic

Model Overview

The neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4 is an 8 billion parameter language model built on the Llama-3.1 architecture, developed by Neural Magic. This model is distinguished by its 2:4 sparsity optimization, meaning that in each group of four weights within transformer blocks, two are retained and two are pruned. It is specifically fine-tuned on the GSM8k dataset to excel at grade-school mathematical reasoning.

Key Capabilities & Performance

Specialized Math Reasoning: Achieves 66.9% 0-shot accuracy on the GSM8k benchmark, outperforming its dense counterpart, Llama-3.1-8B-gsm8k, which scores 66.3%.
Sparsity with Accuracy Recovery: Demonstrates over 100% accuracy recovery compared to the dense fine-tuned model, indicating that the sparsity optimization does not compromise performance in its specialized domain.
Efficient Deployment: Inherits the 2:4 sparsity pattern from its parent model, Sparse-Llama-3.1-8B-2of4, making it suitable for efficient inference, particularly when deployed with backends like vLLM.

Good For

Mathematical Problem Solving: Ideal for applications requiring accurate solutions to grade-school level math problems.
Resource-Efficient AI: Suitable for scenarios where computational efficiency and reduced memory footprint are critical, without significant loss in specialized task performance.
Benchmarking Sparse Models: Useful for researchers and developers exploring the efficacy of sparsity techniques in specialized LLMs.

Overview

Model Overview

Key Capabilities & Performance

Good For

Full Model Card (README)