casperhansen/llama-3-70b-fp16

Hugging Face
TEXT GENERATIONConcurrency Cost:4Model Size:70BQuant:FP8Ctx Length:8kPublished:Apr 18, 2024License:llama3Architecture:Transformer0.0K Warm

The casperhansen/llama-3-70b-fp16 model is a 70 billion parameter large language model developed by Meta, part of the Llama 3 family. This auto-regressive transformer model is instruction-tuned for dialogue use cases, outperforming many open-source chat models on industry benchmarks. It features an 8k context length and utilizes Grouped-Query Attention (GQA) for improved inference scalability, making it suitable for commercial and research applications requiring high-performance conversational AI.

Loading preview...

Overview

casperhansen/llama-3-70b-fp16 is a 70 billion parameter model from Meta's Llama 3 family, released on April 18, 2024. It is an auto-regressive language model built on an optimized transformer architecture, featuring Grouped-Query Attention (GQA) for enhanced inference scalability. The instruction-tuned variant is specifically optimized for dialogue and assistant-like chat applications.

Key Capabilities

  • High Performance: The Llama 3 70B instruction-tuned model significantly outperforms its predecessor, Llama 2 70B, across various benchmarks, including MMLU (82.0 vs 52.9), HumanEval (81.7 vs 25.6), and GSM-8K (93.0 vs 57.5).
  • Extensive Training Data: Pretrained on over 15 trillion tokens from publicly available online data, with the 70B model's knowledge cutoff extending to December 2023.
  • Optimized for Dialogue: Instruction-tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety, making it ideal for conversational AI.
  • Robust Safety Measures: Meta has implemented extensive red teaming, adversarial evaluations, and safety mitigations, alongside tools like Meta Llama Guard 2 and Code Shield, to reduce residual risks and improve refusal handling compared to Llama 2.

Good for

  • Commercial and Research Use: Intended for a wide range of applications in English.
  • Assistant-like Chat: The instruction-tuned version excels in dialogue-based use cases.
  • Natural Language Generation: Pretrained models can be adapted for various NLG tasks.
  • Code Generation: Demonstrates strong performance in coding benchmarks like HumanEval.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p