This is a 7 billion parameter Llama 2 Chat model, developed by Meta, that has been pruned to 30% attention sparsity using the Wanda method. This pruning technique reduces model size without requiring retraining or weight updates, aiming to maintain competitive performance. The model is fine-tuned for dialogue use cases and supports a 4096-token context length, making it suitable for assistant-like chat applications.
Loading preview...
Overview
This model is a 7 billion parameter variant of Meta's Llama 2 Chat, specifically modified with 30% attention sparsity. The sparsity was achieved using the Wanda pruning method, which is notable for not requiring any retraining or weight updates while aiming to preserve performance. The base Llama 2 architecture is an optimized transformer, pretrained on 2 trillion tokens of publicly available data with a cutoff of September 2022, and fine-tuned with over one million human-annotated examples up to July 2023.
Key Capabilities
- Dialogue Optimization: Fine-tuned for assistant-like chat applications, outperforming many open-source chat models.
- Efficient Inference: The 30% attention sparsity can lead to more efficient inference compared to the unpruned base model.
- Robust Performance: Despite pruning, it is designed to maintain competitive performance, leveraging the strong foundation of the Llama 2 Chat model.
- Context Length: Supports a 4096-token context window, suitable for extended conversations.
Good For
- Chatbots and Virtual Assistants: Its fine-tuning for dialogue makes it well-suited for conversational AI.
- Resource-Constrained Deployments: The reduced sparsity could be beneficial for environments where computational resources or memory are limited.
- English Language Tasks: Intended for commercial and research use primarily in English.
Limitations
- License Restrictions: Governed by a custom commercial license from Meta, requiring acceptance before use.
- Potential for Inaccuracies: Like all LLMs, it may produce inaccurate, biased, or objectionable responses, necessitating safety testing for specific applications.
- English Only: Not intended for use in languages other than English.