wang7776/Llama-2-7b-chat-hf-10-sparsity
wang7776/Llama-2-7b-chat-hf-10-sparsity is a 7 billion parameter Llama 2 Chat model developed by Meta, featuring 10% sparsity achieved through Wanda pruning without retraining. This optimization aims to maintain competitive performance while reducing model size. It is fine-tuned for dialogue use cases and optimized for chat applications in English, with a context length of 4096 tokens.
Loading preview...
Overview
This model, wang7776/Llama-2-7b-chat-hf-10-sparsity, is a 7 billion parameter variant of Meta's Llama 2 Chat model. It has been specifically modified to achieve 10% sparsity using the Wanda pruning method. This technique allows for a reduction in model size without requiring additional retraining or weight updates, aiming to preserve performance efficiency.
Key Capabilities & Features
- Base Model: Built upon the Llama 2 7B Chat model, which is optimized for dialogue use cases.
- Sparsity: Incorporates 10% sparsity via Wanda pruning, potentially offering benefits in terms of inference speed or memory footprint.
- Architecture: Utilizes an optimized transformer architecture, fine-tuned with supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) for helpfulness and safety.
- Context Length: Supports a context length of 4096 tokens.
- Language: Primarily intended for commercial and research use in English.
Intended Use Cases
- Dialogue Applications: Optimized for assistant-like chat functionalities.
- Research: Suitable for research into sparse models and their performance characteristics.
- Commercial Use: Permitted under a custom commercial license from Meta.