wang7776/Llama-2-7b-chat-hf-30-sparsity

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 11, 2023License:otherArchitecture:Transformer Cold

This is a 7 billion parameter Llama 2 Chat model, developed by Meta, that has been pruned to 30% sparsity using the Wanda method. This pruning technique allows for competitive performance without requiring retraining or weight updates. Optimized for dialogue use cases, this model is designed for commercial and research applications in English, offering a more efficient deployment option due to its reduced size.

Loading preview...

Overview

This model is a 7 billion parameter variant of Meta's Llama 2 Chat, specifically optimized for dialogue use cases. It has been significantly modified through the Wanda pruning method, achieving 30% sparsity without requiring any retraining or weight updates. This makes it a more efficient version of the base Llama 2 7B Chat model, which is known for outperforming many open-source chat models and being comparable to some closed-source alternatives in helpfulness and safety.

Key Capabilities

  • Efficient Deployment: Achieves 30% sparsity, reducing model size and potentially inference costs, while maintaining competitive performance.
  • Dialogue Optimization: Fine-tuned for assistant-like chat applications, leveraging supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).
  • English Language Support: Intended for commercial and research use in English.
  • Transformer Architecture: Built on an optimized transformer architecture with a 4k context length.

Good For

  • Developers seeking a more resource-efficient version of the Llama 2 7B Chat model.
  • Applications requiring a capable dialogue model with reduced memory footprint.
  • Commercial and research projects focused on English-language conversational AI.