wang7776/vicuna-7b-v1.3-sparsity-20

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jan 15, 2024Architecture:Transformer Cold

The wang7776/vicuna-7b-v1.3-sparsity-20 model is a 7 billion parameter auto-regressive language model, based on the LLaMA architecture and fine-tuned by LMSYS. This specific version has been pruned to 20% sparsity using the Wanda method, which reduces model size without retraining while maintaining competitive performance. It is primarily intended for research and hobbyist use in natural language processing and chatbot development, excelling as a chat assistant.

Loading preview...

Overview

This model, wang7776/vicuna-7b-v1.3-sparsity-20, is a 7 billion parameter variant of the Vicuna v1.3 chat assistant, developed by LMSYS. It is fine-tuned from the LLaMA architecture using supervised instruction fine-tuning on approximately 125K conversations from ShareGPT.com.

Key Differentiator: Sparsity

What sets this model apart is its 20% sparsity, achieved through the Wanda pruning method. This technique allows for significant model compression without requiring retraining or weight updates, aiming to preserve performance while reducing computational overhead.

Capabilities & Use Cases

  • Chat Assistant: Designed to function as a conversational AI, fine-tuned on real user-shared dialogues.
  • Research & Development: Primarily intended for researchers and hobbyists exploring large language models and chatbot technologies.
  • Efficient Deployment: The pruned nature of this model makes it potentially more efficient for deployment in resource-constrained environments compared to its dense counterpart, while still offering competitive performance.

Getting Started

Users can interact with the model via command-line interfaces or through APIs compatible with OpenAI and Hugging Face. Further details on its evaluation, including standard benchmarks, human preference, and LLM-as-a-judge methods, are available in the associated paper and leaderboard.