amdnsr/llama-7b-hf
The amdnsr/llama-7b-hf model is a 7 billion parameter auto-regressive language model based on the Transformer architecture, developed by the FAIR team of Meta AI. This foundational model is primarily intended for research into large language models, focusing on understanding capabilities, limitations, and developing mitigation techniques for biases and harmful content. It excels in common sense reasoning, reading comprehension, and natural language understanding tasks, as evidenced by its performance on benchmarks like MMLU and BIG-bench hard.
Loading preview...
Overview
The amdnsr/llama-7b-hf is a 7 billion parameter version of the LLaMA (Large Language Model Meta AI) foundational model, developed by Meta AI's FAIR team. Trained between December 2022 and February 2023, this model is an auto-regressive language model built upon the Transformer architecture. It has been converted to be compatible with the HuggingFace Transformers library.
Key Capabilities
- Research on Large Language Models: Primarily designed for academic and research purposes to explore applications, understand limitations, and develop improvements for LLMs.
- Common Sense Reasoning: Demonstrates strong performance on benchmarks such as BoolQ (76.5%), HellaSwag (76.1%), and COPA (93%).
- Natural Language Understanding: Evaluated for tasks like reading comprehension and general NLU, with results on MMLU and BIG-bench hard.
- Bias Evaluation: The model's biases related to gender, religion, race, sexual orientation, age, nationality, disability, physical appearance, and socioeconomic status have been evaluated.
Training and Data
The LLaMA 7B model was trained on 1 trillion tokens from a diverse dataset including CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), and Stack Exchange (2%). The training data included content in 20 languages, though English constitutes the majority, suggesting better performance for English tasks.
Intended Use Cases
This model is a base, or foundational, model and is intended for researchers in natural language processing, machine learning, and artificial intelligence. It is suitable for:
- Exploring potential applications like question answering and reading comprehension.
- Understanding the capabilities and limitations of current language models.
- Developing techniques to improve model performance and mitigate issues like bias, toxicity, and hallucinations.
Out-of-scope uses include direct deployment in downstream applications without further risk evaluation and mitigation, as it has not been trained with human feedback and may generate toxic, offensive, or incorrect information.