Name: wang7776/vicuna-7b-v1.3-attention-sparsity-30 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wang7776

Overview

This model, wang7776/vicuna-7b-v1.3-attention-sparsity-30, is a specialized version of the 7 billion parameter Vicuna v1.3 model. It has undergone 30% sparsity pruning specifically within its attention layers using the Wanda pruning method. A key advantage of this method is that it achieves significant model compression without requiring any retraining or weight updates, aiming to preserve competitive performance.

Key Capabilities

Efficient Inference: Reduced computational requirements due to 30% sparsity in attention layers.
Chat Assistant: Based on the Vicuna v1.3 architecture, fine-tuned from LLaMA on user-shared conversations from ShareGPT.
Research & Development: Primarily intended for research and hobbyist exploration in large language models and chatbots.

Training Details

The base Vicuna v1.3 model was fine-tuned from LLaMA using supervised instruction fine-tuning on approximately 125K conversations from ShareGPT.com. The sparsity pruning was applied post-training to this base model, as detailed in the Wanda pruning method.

Overview

Overview

Key Capabilities

Training Details

Full Model Card (README)