NousResearch/Meta-Llama-3-8B-Instruct

Warm
Public
8B
FP8
8192
License: llama3
Hugging Face
Overview

Model Overview

NousResearch/Meta-Llama-3-8B-Instruct is an 8 billion parameter instruction-tuned model from Meta's Llama 3 family, designed for commercial and research use in English. It is optimized for dialogue and assistant-like chat applications, built upon an optimized transformer architecture with a context length of 8192 tokens. The model was trained on over 15 trillion tokens of publicly available data, with fine-tuning data including over 10 million human-annotated examples, and features Grouped-Query Attention (GQA) for improved inference scalability.

Key Capabilities & Performance

  • Dialogue Optimization: Specifically instruction-tuned for chat and assistant-like interactions, outperforming previous Llama 2 models and many other open-source chat models on industry benchmarks.
  • Enhanced Safety & Helpfulness: Developed with a strong focus on optimizing helpfulness and safety through SFT and RLHF, and significantly reduces false refusals compared to Llama 2.
  • Strong Benchmark Results: Demonstrates notable improvements across various benchmarks, including MMLU (68.4), HumanEval (62.2), and GSM-8K (79.6), showcasing its capabilities in reasoning, code generation, and mathematical tasks.

Intended Use Cases

  • Assistant-like Chatbots: Ideal for building conversational AI agents and virtual assistants.
  • Natural Language Generation: Adaptable for a variety of text generation tasks in English.
  • Commercial and Research Applications: Suitable for both commercial deployments and academic research, with a custom commercial license available.