Meta-Llama-3-8B: An Advanced 8B Parameter LLM from Meta
Meta-Llama-3-8B is an 8 billion parameter instruction-tuned large language model developed by Meta, designed for generative text and code. As part of the Llama 3 family, it leverages an optimized transformer architecture and incorporates Grouped-Query Attention (GQA) for enhanced inference scalability. The instruction-tuned variant is specifically optimized for dialogue use cases, demonstrating superior performance compared to previous Llama 2 models on various industry benchmarks.
Key Capabilities
- Optimized for Dialogue: Instruction-tuned for assistant-like chat applications, aligning with human preferences for helpfulness and safety through supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF).
- Strong Benchmark Performance: Achieves 68.4 on MMLU (5-shot), 62.2 on HumanEval (0-shot), and 79.6 on GSM-8K (8-shot, CoT), significantly surpassing Llama 2 7B and 13B models.
- Extensive Training Data: Pretrained on over 15 trillion tokens of publicly available online data, with a knowledge cutoff of March 2023, and fine-tuned with over 10 million human-annotated examples.
- 8K Context Length: Supports an 8,000 token context window, enabling processing of longer inputs and generating more coherent responses.
- Responsible AI Focus: Developed with a strong emphasis on safety, including extensive red teaming, adversarial evaluations, and mitigations to reduce residual risks and false refusals, making it more helpful than Llama 2.
Good for
- Building highly performant English-language chatbots and virtual assistants.
- General natural language generation tasks requiring high accuracy and coherence.
- Applications benefiting from strong reasoning and code generation capabilities, as indicated by its HumanEval and GSM-8K scores.
- Developers seeking a powerful, openly available model with robust safety considerations for commercial and research use.