NousResearch/Llama-2-70b-chat-hf

TEXT GENERATIONConcurrency Cost:4Model Size:69BQuant:FP8Ctx Length:32kPublished:Jul 19, 2023Architecture:Transformer0.0K Cold

NousResearch/Llama-2-70b-chat-hf is a 69 billion parameter, instruction-tuned generative text model developed by Meta, optimized for dialogue use cases. This Llama 2 variant utilizes an optimized transformer architecture with Grouped-Query Attention for improved inference scalability. It is fine-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to enhance helpfulness and safety, making it suitable for assistant-like chat applications.

Loading preview...

Overview

NousResearch/Llama-2-70b-chat-hf is the 70 billion parameter, fine-tuned variant of Meta's Llama 2 family of large language models. Optimized for dialogue, this model is designed for assistant-like chat applications. It was developed between January and July 2023, leveraging an optimized transformer architecture and Grouped-Query Attention (GQA) for efficient inference.

Key Capabilities

  • Dialogue Optimization: Specifically fine-tuned for chat and conversational use cases through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).
  • Performance: Outperforms many open-source chat models on various benchmarks and achieves performance comparable to popular closed-source models like ChatGPT and PaLM in human evaluations for helpfulness and safety.
  • Scalability: The 70B model incorporates Grouped-Query Attention (GQA) to improve inference scalability.
  • Safety Alignment: Fine-tuned versions show strong safety performance, with the 70B-Chat model achieving 64.14% on TruthfulQA (truthful and informative generations) and 0.01% on ToxiGen (toxic generations).

Good For

  • Commercial and Research Use: Intended for both commercial and research applications in English.
  • Assistant-like Chat: Ideal for building conversational AI agents, chatbots, and virtual assistants.
  • Natural Language Generation: While specifically tuned for chat, its base architecture can be adapted for various text generation tasks.