Name: wang7776/Mistral-7B-Instruct-v0.2-attention-sparsity-20 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wang7776

Overview

This model, wang7776/Mistral-7B-Instruct-v0.2-attention-sparsity-20, is a 7 billion parameter instruction-tuned language model derived from Mistral-7B-Instruct-v0.2. Its key differentiator is the application of the Wanda pruning method to its attention layers, achieving 20% sparsity. This process is notable because it requires no retraining or weight updates, yet aims to maintain competitive performance while reducing the model's computational footprint.

Key Characteristics

Pruned Architecture: Features 20% attention sparsity, making it more efficient for inference.
Base Model: Built upon Mistral-7B-Instruct-v0.2, an improved instruction-tuned version of Mistral-7B-Instruct-v0.1.
Core Technologies: Incorporates Grouped-Query Attention and Sliding-Window Attention for efficient processing of longer contexts.
Instruction Following: Designed to respond to instructions, utilizing a specific [INST] and [/INST] token format for prompts.

Use Cases

This model is suitable for applications where reduced model size and faster inference are critical, without significantly compromising the instruction-following capabilities of the original Mistral-7B-Instruct-v0.2. It's particularly useful for:

Resource-constrained environments: Deployments on devices with limited memory or processing power.
Instruction-based tasks: Generating responses to user prompts and following specific instructions.

Limitations

As an instruction-tuned model, it lacks inherent moderation mechanisms. Users should implement their own guardrails for deployments requiring moderated outputs.