PawanKrd/Meta-Llama-3-8B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Apr 18, 2024License:llama3Architecture:Transformer Cold

PawanKrd/Meta-Llama-3-8B-Instruct is an 8 billion parameter instruction-tuned causal language model developed by Meta, part of the Llama 3 family. Optimized for dialogue use cases, it leverages an optimized transformer architecture with Grouped-Query Attention (GQA) and is fine-tuned using SFT and RLHF. This model excels in assistant-like chat applications and demonstrates strong performance across various benchmarks, including MMLU and HumanEval.

Loading preview...

Model Overview

PawanKrd/Meta-Llama-3-8B-Instruct is an 8 billion parameter instruction-tuned model from Meta's Llama 3 family, designed for dialogue and assistant-like chat applications. It utilizes an optimized transformer architecture with Grouped-Query Attention (GQA) for improved inference scalability and was fine-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model was trained on over 15 trillion tokens of publicly available data, with a knowledge cutoff of March 2023.

Key Capabilities

  • Enhanced Dialogue Performance: Optimized for chat and assistant-like interactions, outperforming many open-source chat models on industry benchmarks.
  • Strong Benchmark Results: Achieves 68.4% on MMLU (5-shot), 62.2% on HumanEval (0-shot), and 79.6% on GSM-8K (8-shot, CoT), demonstrating significant improvements over Llama 2 models.
  • Reduced Refusals: Fine-tuned to be significantly less likely to falsely refuse benign prompts compared to Llama 2, improving user experience.
  • English Language Focus: Primarily intended for commercial and research use in English, though fine-tuning for other languages is permissible under license.

Good For

  • Developing conversational AI agents and chatbots.
  • Applications requiring strong general reasoning and code generation capabilities.
  • Research into large language models and their applications.