wang7776/Mistral-7B-Instruct-v0.2-attention-sparsity-20
TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Jan 25, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

wang7776/Mistral-7B-Instruct-v0.2-attention-sparsity-20 is a 7 billion parameter instruction-tuned language model, based on Mistral-7B-Instruct-v0.2, that has been pruned to 20% attention sparsity using the Wanda method. This pruning technique reduces model size and computational requirements without retraining, making it efficient for deployment. It retains the original model's capabilities for instruction-following tasks, leveraging Grouped-Query Attention and Sliding-Window Attention.

Loading preview...

Overview

This model, wang7776/Mistral-7B-Instruct-v0.2-attention-sparsity-20, is a 7 billion parameter instruction-tuned language model derived from Mistral-7B-Instruct-v0.2. Its key differentiator is the application of the Wanda pruning method to its attention layers, achieving 20% sparsity. This process is notable because it requires no retraining or weight updates, yet aims to maintain competitive performance while reducing the model's computational footprint.

Key Characteristics

  • Pruned Architecture: Features 20% attention sparsity, making it more efficient for inference.
  • Base Model: Built upon Mistral-7B-Instruct-v0.2, an improved instruction-tuned version of Mistral-7B-Instruct-v0.1.
  • Core Technologies: Incorporates Grouped-Query Attention and Sliding-Window Attention for efficient processing of longer contexts.
  • Instruction Following: Designed to respond to instructions, utilizing a specific [INST] and [/INST] token format for prompts.

Use Cases

This model is suitable for applications where reduced model size and faster inference are critical, without significantly compromising the instruction-following capabilities of the original Mistral-7B-Instruct-v0.2. It's particularly useful for:

  • Resource-constrained environments: Deployments on devices with limited memory or processing power.
  • Instruction-based tasks: Generating responses to user prompts and following specific instructions.

Limitations

As an instruction-tuned model, it lacks inherent moderation mechanisms. Users should implement their own guardrails for deployments requiring moderated outputs.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p