wang7776/Mistral-7B-Instruct-v0.2-sparsity-10

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Dec 26, 2023License:apache-2.0Architecture:Transformer Open Weights Cold

wang7776/Mistral-7B-Instruct-v0.2-sparsity-10 is a 7 billion parameter instruction-tuned causal language model, based on Mistral AI's Mistral-7B-Instruct-v0.2. This version has been pruned to 10% sparsity using the Wanda method, which aims to maintain competitive performance without retraining. It features Grouped-Query Attention, Sliding-Window Attention, and a Byte-fallback BPE tokenizer, making it suitable for efficient instruction-following tasks.

Loading preview...

Overview

This model, wang7776/Mistral-7B-Instruct-v0.2-sparsity-10, is a 7 billion parameter instruction-tuned language model derived from Mistral AI's Mistral-7B-Instruct-v0.2. Its key differentiator is the application of the Wanda pruning method, reducing its sparsity to 10% without requiring additional retraining or weight updates, while still aiming for competitive performance.

Key Capabilities

  • Efficient Instruction Following: Built upon the Mistral-7B-Instruct-v0.2 base, it is designed to follow instructions effectively.
  • Optimized Architecture: Incorporates advanced architectural features like Grouped-Query Attention and Sliding-Window Attention for improved efficiency.
  • Reduced Size: The 10% sparsity can lead to a smaller model footprint and potentially faster inference compared to its dense counterpart.

When to Use This Model

  • Resource-Constrained Environments: Ideal for scenarios where computational resources or memory are limited, but instruction-following capabilities are still required.
  • Experimentation with Pruning: Useful for developers interested in exploring the performance of pruned models without extensive retraining.
  • General Instruction-Following: Suitable for a wide range of tasks that benefit from an instruction-tuned model, leveraging its base model's capabilities.