wang7776/vicuna-7b-v1.3-sparsity-10

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 16, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The wang7776/vicuna-7b-v1.3-sparsity-10 is a 7 billion parameter auto-regressive language model, fine-tuned from LLaMA by LMSYS, that has been pruned to 10% sparsity using the Wanda method. This pruning technique reduces model size without retraining or weight updates, aiming to maintain competitive performance. It is primarily intended for research and development in large language models and chatbots.

Loading preview...

Overview

This model, wang7776/vicuna-7b-v1.3-sparsity-10, is a 7 billion parameter variant of the Vicuna v1.3 model, developed by LMSYS. It is an auto-regressive language model fine-tuned from LLaMA on approximately 125K user-shared conversations from ShareGPT.com. A key differentiator of this specific model is its application of the Wanda pruning method, achieving 10% sparsity without requiring retraining or weight updates, while aiming to preserve performance.

Key Capabilities

  • Chat Assistant: Functions as a chat assistant, fine-tuned on conversational data.
  • Sparsity: Incorporates 10% sparsity via Wanda pruning, potentially offering efficiency benefits.
  • Research Tool: Primarily designed for research and development in large language models and chatbots.

Good For

  • LLM Research: Ideal for researchers and hobbyists exploring sparse models and their performance characteristics.
  • Chatbot Development: Suitable for experimenting with conversational AI applications based on the Vicuna architecture.
  • Efficiency Studies: Useful for investigating the impact of pruning techniques like Wanda on model size and inference efficiency without significant performance degradation.