casperhansen/llama-3-8b-fp16

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:llama3Architecture:Transformer0.0K Warm

The casperhansen/llama-3-8b-fp16 model is a 8 billion parameter, pre-trained generative text model from the Meta Llama 3 family, developed by Meta. It utilizes an optimized transformer architecture and has a context length of 8192 tokens. This model is designed for commercial and research use in English, excelling in natural language generation tasks and serving as a strong foundation for further fine-tuning.

Loading preview...

Model Overview

The casperhansen/llama-3-8b-fp16 is an 8 billion parameter variant of Meta's Llama 3 family of large language models. Developed by Meta, this model is a pre-trained, auto-regressive language model built on an optimized transformer architecture, featuring Grouped-Query Attention (GQA) for enhanced inference scalability. It was trained on over 15 trillion tokens of publicly available online data, with a knowledge cutoff of March 2023, and supports an 8k token context length.

Key Capabilities

  • General Language Understanding: Demonstrates strong performance across various general benchmarks, including MMLU (66.6), AGIEval English (45.9), and ARC-Challenge (78.6).
  • Knowledge Reasoning: Achieves 78.5 on TriviaQA-Wiki, indicating solid knowledge retrieval capabilities.
  • Reading Comprehension: Performs well on tasks like SQuAD (76.4) and BoolQ (75.7).
  • Foundation Model: Intended for commercial and research use in English, serving as a robust base for diverse natural language generation tasks.

Good For

  • Natural Language Generation: Ideal for applications requiring text generation, summarization, and other generative tasks.
  • Research and Development: Suitable for researchers exploring LLM capabilities and developing new applications.
  • Further Fine-tuning: Provides a strong base for fine-tuning on specific datasets or for specialized use cases, including adaptation for languages beyond English (with compliance to the Llama 3 Community License).

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p