4bit/Llama-2-70b-chat-hf

TEXT GENERATIONConcurrency Cost:4Model Size:69BQuant:FP8Ctx Length:32kPublished:Jul 19, 2023Architecture:Transformer0.0K Cold

Llama-2-70b-chat-hf is a 69 billion parameter generative text model developed by Meta, fine-tuned for dialogue use cases. This model utilizes an optimized transformer architecture and is specifically designed for assistant-like chat applications. It leverages supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety, outperforming many open-source chat models on various benchmarks.

Loading preview...

Llama-2-70b-chat-hf Overview

This model is the 70 billion parameter variant from Meta's Llama 2 family, specifically fine-tuned for dialogue use cases. It is an auto-regressive language model built on an optimized transformer architecture. The 'chat' versions, including this one, are enhanced using Supervised Fine-Tuning (SFT) and Reinforcement Learning with Human Feedback (RLHF) to improve alignment with human preferences for helpfulness and safety.

Key Capabilities & Features

  • Dialogue Optimization: Specifically trained for assistant-like chat interactions.
  • Performance: Outperforms other open-source chat models on tested benchmarks and is competitive with some closed-source models in human evaluations for helpfulness and safety.
  • Architecture: Employs Grouped-Query Attention (GQA) for improved inference scalability, a feature present in larger Llama 2 models.
  • Training Data: Pretrained on 2 trillion tokens from publicly available sources, with fine-tuning data including over one million human-annotated examples.
  • Context Length: Supports a context length of 4k tokens.

Intended Use Cases

  • Commercial and Research: Designed for use in both commercial products and research applications.
  • Assistant-like Chat: Optimized for conversational AI and chatbot development.

Limitations

  • English Only: Intended for use primarily in English.
  • Safety Considerations: As with all LLMs, requires developer-side safety testing and tuning for specific applications due to potential for inaccurate, biased, or objectionable responses.