4bit/Llama-2-7b-chat-hf

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Jul 19, 2023Architecture:Transformer0.0K Cold

The 4bit/Llama-2-7b-chat-hf model is a 7 billion parameter, fine-tuned generative text model developed by Meta, optimized for dialogue use cases. It utilizes an optimized transformer architecture and has a context length of 4096 tokens. This model is specifically designed for assistant-like chat applications and outperforms many open-source chat models on benchmarks for helpfulness and safety. It was trained on 2 trillion tokens of publicly available data, with fine-tuning data including over one million human-annotated examples.

Loading preview...

Model Overview

This is the Hugging Face Transformers format of Meta's Llama-2-7b-chat model, a 7 billion parameter large language model. It is part of the Llama 2 family, which includes models ranging from 7B to 70B parameters, all pretrained and fine-tuned for generative text tasks. The Llama-2-Chat variants, including this 7B model, are specifically optimized for dialogue and assistant-like chat applications.

Key Capabilities & Features

  • Optimized for Dialogue: Fine-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety in chat scenarios.
  • Performance: Outperforms many open-source chat models on various benchmarks and achieves competitive results with some popular closed-source models like ChatGPT and PaLM in human evaluations for helpfulness and safety.
  • Architecture: Employs an auto-regressive language model with an optimized transformer architecture.
  • Training Data: Pretrained on 2 trillion tokens from publicly available online data, with fine-tuning incorporating over one million human-annotated examples. Data cutoff for pretraining is September 2022, with some tuning data up to July 2023.
  • Context Length: Supports a context length of 4096 tokens.

Intended Use Cases

  • Assistant-like Chat: Primarily intended for commercial and research use in English for conversational AI.
  • Natural Language Generation: While the chat version is tuned for dialogue, the base Llama 2 models can be adapted for various natural language generation tasks.

Limitations

  • English Only: Testing has been conducted in English, and use in other languages is considered out-of-scope.
  • Potential for Objectionable Content: As with all LLMs, it may produce inaccurate, biased, or objectionable responses, requiring developers to perform safety testing for specific applications.