baffo32/decapoda-research-llama-7B-hf
The baffo32/decapoda-research-llama-7B-hf model is a 7 billion parameter auto-regressive language model, based on the transformer architecture, developed by Meta AI's FAIR team. This version is a conversion of the original LLaMA-7B to work with HuggingFace's Transformers library. Primarily intended for research on large language models, it supports exploring applications like question answering and natural language understanding, and evaluating model capabilities and limitations.
Loading preview...
Model Overview
The baffo32/decapoda-research-llama-7B-hf is a 7 billion parameter LLaMA model, developed by Meta AI's FAIR team, converted for use with HuggingFace's Transformers library. This foundational model, trained between December 2022 and February 2023, is an auto-regressive language model built on the transformer architecture. It is part of a family of models including 13B, 33B, and 65B parameter versions.
Key Capabilities & Training
- Architecture: Transformer-based, auto-regressive language model.
- Training Data: Trained on a diverse dataset including CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), and Stack Exchange (2%). The dataset includes content in 20 languages, though English constitutes the majority.
- Evaluation: Assessed on common sense reasoning benchmarks (BoolQ, PIQA, SIQA, HellaSwag, WinoGrande, ARC, OBQA, COPA), question answering, and bias metrics (gender, religion, race, etc.).
- Performance: The 7B model achieved scores such as 76.5 on BoolQ, 79.8 on PIQA, and 93 on COPA for reasoning tasks.
Intended Use & Limitations
- Primary Use: Designed for research in large language models, including exploring applications like question answering and natural language understanding, understanding model capabilities and limitations, and developing mitigation techniques for biases and harmful content.
- Primary Users: Researchers in natural language processing, machine learning, and artificial intelligence.
- Out-of-Scope: As a base model, it is not intended for direct use in downstream applications without further risk evaluation and mitigation. It has not been trained with human feedback and may generate toxic, offensive, or incorrect information. Performance may vary across languages and dialects due to the English-heavy training data.