Overview

This model is the 8 billion parameter instruction-tuned variant from Meta's Llama 3.1 collection, released on July 23, 2024. It is built on an optimized transformer architecture utilizing Grouped-Query Attention (GQA) for enhanced inference scalability. The model boasts an extensive 128k token context length and was trained on over 15 trillion tokens of publicly available online data, with a knowledge cutoff of December 2023. Fine-tuning involved supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

Key Capabilities

Multilingual Dialogue: Optimized for assistant-like chat in multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Extended Context: Supports a 128k token context window, enabling processing of longer inputs and generating more coherent, extended responses.
Advanced Tool Use: Features robust support for tool use and function calling, with detailed prompt formatting guidelines and integration with Hugging Face Transformers chat templates.
Strong Benchmarks: Demonstrates significant improvements over Llama 3 8B Instruct across various benchmarks, particularly in MMLU (CoT), HumanEval (pass@1), GSM-8K (CoT), MATH (CoT), and API-Bank for tool use.

Good For

Developing multilingual chat assistants and conversational AI applications.
Tasks requiring long-context understanding and generation.
Applications leveraging tool use and function calling for enhanced capabilities.
Research and commercial use in natural language generation across supported languages.

Overview

Overview

Key Capabilities

Good For

Full Model Card (README)