Overview

This model is a 7 billion parameter variant of Meta's Llama 2 Chat, specifically optimized for dialogue use cases. It has been significantly modified through the Wanda pruning method, achieving 30% sparsity without requiring any retraining or weight updates. This makes it a more efficient version of the base Llama 2 7B Chat model, which is known for outperforming many open-source chat models and being comparable to some closed-source alternatives in helpfulness and safety.

Key Capabilities

Efficient Deployment: Achieves 30% sparsity, reducing model size and potentially inference costs, while maintaining competitive performance.
Dialogue Optimization: Fine-tuned for assistant-like chat applications, leveraging supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).
English Language Support: Intended for commercial and research use in English.
Transformer Architecture: Built on an optimized transformer architecture with a 4k context length.

Good For

Developers seeking a more resource-efficient version of the Llama 2 7B Chat model.
Applications requiring a capable dialogue model with reduced memory footprint.
Commercial and research projects focused on English-language conversational AI.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)