Name: wang7776/vicuna-7b-v1.3-attention-sparsity-10 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: wang7776

Overview

This model, wang7776/vicuna-7b-v1.3-attention-sparsity-10, is a specialized version of the 7 billion parameter Vicuna v1.3 model, originally developed by LMSYS. It has undergone a pruning process to achieve 10% sparsity in its attention layers using the Wanda pruning method.

Key Characteristics

Sparsity: Achieves 10% sparsity in attention layers without requiring retraining or weight updates.
Base Model: Fine-tuned from the LLaMA architecture, specifically Vicuna v1.3.
Training Data: Fine-tuned on approximately 125K user-shared conversations from ShareGPT.com.
Performance: Aims to maintain competitive performance despite significant pruning, as suggested by the Wanda method's principles.

Intended Use Cases

Research: Ideal for researchers studying efficient large language models, model compression techniques, and the impact of sparsity on performance.
Hobbyist Exploration: Suitable for hobbyists interested in experimenting with pruned models and chatbots.
Chatbot Development: Can be used as a base for developing chat assistants, leveraging its instruction-tuned nature.

Getting Started

Users can interact with the model via command-line interfaces or APIs (OpenAI API, Huggingface API) through the FastChat framework.

Overview

Overview

Key Characteristics

Intended Use Cases

Getting Started

Full Model Card (README)