casperhansen/llama-3-8b-fp16
The casperhansen/llama-3-8b-fp16 model is a 8 billion parameter, pre-trained generative text model from the Meta Llama 3 family, developed by Meta. It utilizes an optimized transformer architecture and has a context length of 8192 tokens. This model is designed for commercial and research use in English, excelling in natural language generation tasks and serving as a strong foundation for further fine-tuning.
Loading preview...
Model Overview
The casperhansen/llama-3-8b-fp16 is an 8 billion parameter variant of Meta's Llama 3 family of large language models. Developed by Meta, this model is a pre-trained, auto-regressive language model built on an optimized transformer architecture, featuring Grouped-Query Attention (GQA) for enhanced inference scalability. It was trained on over 15 trillion tokens of publicly available online data, with a knowledge cutoff of March 2023, and supports an 8k token context length.
Key Capabilities
- General Language Understanding: Demonstrates strong performance across various general benchmarks, including MMLU (66.6), AGIEval English (45.9), and ARC-Challenge (78.6).
- Knowledge Reasoning: Achieves 78.5 on TriviaQA-Wiki, indicating solid knowledge retrieval capabilities.
- Reading Comprehension: Performs well on tasks like SQuAD (76.4) and BoolQ (75.7).
- Foundation Model: Intended for commercial and research use in English, serving as a robust base for diverse natural language generation tasks.
Good For
- Natural Language Generation: Ideal for applications requiring text generation, summarization, and other generative tasks.
- Research and Development: Suitable for researchers exploring LLM capabilities and developing new applications.
- Further Fine-tuning: Provides a strong base for fine-tuning on specific datasets or for specialized use cases, including adaptation for languages beyond English (with compliance to the Llama 3 Community License).
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.