The wang7776/vicuna-7b-v1.3-attention-sparsity-10 is a 7 billion parameter Vicuna v1.3 model, developed by LMSYS, that has been pruned to 10% sparsity in its attention layers using the Wanda method. This pruning technique requires no retraining or weight updates, aiming to maintain competitive performance while reducing model size or computational requirements. It is primarily intended for research and hobbyist use in natural language processing and artificial intelligence, particularly for exploring efficient large language models and chatbots.
Loading preview...
Overview
This model, wang7776/vicuna-7b-v1.3-attention-sparsity-10, is a specialized version of the 7 billion parameter Vicuna v1.3 model, originally developed by LMSYS. It has undergone a pruning process to achieve 10% sparsity in its attention layers using the Wanda pruning method.
Key Characteristics
- Sparsity: Achieves 10% sparsity in attention layers without requiring retraining or weight updates.
- Base Model: Fine-tuned from the LLaMA architecture, specifically Vicuna v1.3.
- Training Data: Fine-tuned on approximately 125K user-shared conversations from ShareGPT.com.
- Performance: Aims to maintain competitive performance despite significant pruning, as suggested by the Wanda method's principles.
Intended Use Cases
- Research: Ideal for researchers studying efficient large language models, model compression techniques, and the impact of sparsity on performance.
- Hobbyist Exploration: Suitable for hobbyists interested in experimenting with pruned models and chatbots.
- Chatbot Development: Can be used as a base for developing chat assistants, leveraging its instruction-tuned nature.
Getting Started
Users can interact with the model via command-line interfaces or APIs (OpenAI API, Huggingface API) through the FastChat framework.