wang7776/Mistral-7B-Instruct-v0.2-attention-sparsity-20 is a 7 billion parameter instruction-tuned language model, based on Mistral-7B-Instruct-v0.2, that has been pruned to 20% attention sparsity using the Wanda method. This pruning technique reduces model size and computational requirements without retraining, making it efficient for deployment. It retains the original model's capabilities for instruction-following tasks, leveraging Grouped-Query Attention and Sliding-Window Attention.
No reviews yet. Be the first to review!