wang7776/Llama-2-7b-chat-hf-20-sparsity

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 13, 2023License:otherArchitecture:Transformer Cold

wang7776/Llama-2-7b-chat-hf-20-sparsity is a 7 billion parameter Llama 2 Chat model developed by Meta, featuring a 4096-token context length. This specific version has been pruned to 20% sparsity using the Wanda method, which reduces model size without retraining while maintaining competitive performance. It is fine-tuned for dialogue use cases and optimized for assistant-like chat in English.

Loading preview...

Model Overview

This model, wang7776/Llama-2-7b-chat-hf-20-sparsity, is a 7 billion parameter variant of Meta's Llama 2 Chat series, designed for dialogue applications. It leverages an optimized transformer architecture and has been fine-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Key Differentiators

  • Sparsity: This version has been pruned to 20% sparsity using the Wanda method, which aims to reduce model size and computational requirements without requiring retraining or weight updates, while still achieving competitive performance.
  • Dialogue Optimization: As a Llama-2-Chat model, it is specifically optimized for assistant-like conversational use cases.
  • Performance: The base Llama 2 Chat models have shown competitive performance against other open-source chat models and are on par with some closed-source models in human evaluations for helpfulness and safety.

Intended Use Cases

  • Commercial and Research: Suitable for both commercial and research applications in English.
  • Assistant-like Chat: Primarily intended for generating human-like responses in dialogue systems.

Limitations

  • English Only: Intended for use in English; performance in other languages is not guaranteed.
  • Safety Considerations: As with all LLMs, it may produce inaccurate, biased, or objectionable responses, requiring developers to perform safety testing and tuning for specific applications.