Name: RedHatAI/Llama-2-7b-pruned70-retrained API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: RedHatAI

Overview

RedHatAI/Llama-2-7b-pruned70-retrained is a 7 billion parameter model based on the Llama 2 architecture, developed by Neural Magic and Cerebras. This model distinguishes itself through its high sparsity, achieved by pruning 70% of its parameters in a one-shot process using SparseGPT, followed by extensive retraining. Initially, 50% of parameters were pruned and the model was retrained with 50 billion tokens from SlimPajama, maintaining sparsity. Subsequently, it was pruned to 70% sparsity and trained for an additional 100 billion tokens.

Key Capabilities

High Sparsity: Achieves 70% parameter sparsity, enabling more efficient inference and deployment.
Retrained Performance: Despite significant pruning, the model was retrained on 150 billion tokens (50B + 100B) from SlimPajama to maintain and recover performance.
Sparse Transfer: Designed to leverage its pre-sparsified structure for efficient fine-tuning on new data, reducing training times and computational costs.
Accelerated Inference: Compatible with specialized inference engines like nm-vllm and deepsparse for optimized performance.

Benchmarks

While pruning introduces some performance trade-offs compared to the original Llama-2-7b, the model shows competitive results, particularly in code generation:

HumanEval (pass@1): 14.4% (vs. 13.4% for Llama-2-7b)
MMLU (5-shot): 36.5% (vs. 46.9% for Llama-2-7b)
HellaSwag (0-shot): 74.1% (vs. 78.6% for Llama-2-7b)

Good for

Resource-constrained environments: Its high sparsity makes it suitable for deployment where computational resources are limited.
Efficient fine-tuning: Ideal for users looking to fine-tune a Llama 2 base model with reduced computational overhead and faster training times through sparse transfer.
Applications requiring code generation: Shows a slight improvement over the base Llama-2-7b on the HumanEval benchmark.

Overview

Overview

Key Capabilities

Benchmarks

Good for

Full Model Card (README)