Name: wang7776/vicuna-7b-v1.3-attention-sparsity-20 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wang7776

Overview

This model, wang7776/vicuna-7b-v1.3-attention-sparsity-20, is a 7 billion parameter variant of the Vicuna v1.3 chat assistant, originally developed by LMSYS. It is fine-tuned from LLaMA on approximately 125K user-shared conversations from ShareGPT.com. The key differentiator for this specific model is its 20% attention layer sparsity, achieved through the Wanda pruning method. This technique allows for significant model compression without requiring retraining or weight updates, aiming to preserve performance while improving efficiency.

Key Capabilities

Efficient Inference: Reduced computational load due to 20% sparsity in attention layers.
Chat Assistant: Designed for conversational AI tasks, inheriting Vicuna's chat capabilities.
Research & Development: Suitable for exploring sparse model architectures and their practical applications.

Good For

Researchers and hobbyists experimenting with pruned language models.
Applications where computational efficiency and reduced model size are critical, without a drastic drop in performance.
Studying the impact of attention sparsity on large language models.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)