shaohang/Sparse_llama-7B
The shaohang/Sparse_llama-7B is a 7 billion parameter auto-regressive language model based on the Transformer architecture, developed by the FAIR team of Meta AI. This version is a conversion of the original LLaMA-7B to work with HuggingFace Transformers. It is primarily intended for research in large language models, focusing on understanding capabilities, limitations, and mitigating biases, rather than direct downstream applications.
Loading preview...
Overview
shaohang/Sparse_llama-7B is a 7 billion parameter LLaMA model, originally developed by Meta AI's FAIR team, converted for use with HuggingFace Transformers. LLaMA is an auto-regressive language model built on the Transformer architecture, trained between December 2022 and February 2023. This specific model is version 1 of the 7B parameter variant, featuring a context length of 4096 tokens.
Key Characteristics
- Architecture: Transformer-based, auto-regressive language model.
- Training Data: Trained on a diverse dataset including CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), and Stack Exchange (2%).
- Multilingual Support: While predominantly English, the training data included 20 languages (bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk) from Wikipedia and Books domains.
- Performance: Achieves scores such as 76.5 on BoolQ, 79.8 on PIQA, and 76.1 on HellaSwag for reasoning tasks.
- Bias Evaluation: Evaluated for biases across categories like gender, religion, race, and age, with an average bias score of 66.6.
Intended Use Cases
This model is primarily intended for research purposes in large language models, including:
- Exploring potential applications like question answering and natural language understanding.
- Understanding the capabilities and limitations of current language models.
- Developing techniques to improve models and mitigate biases, risks, and harmful content generation.
Limitations and Out-of-Scope Uses
As a foundational model, LLaMA-7B is not intended for direct use in downstream applications without further risk evaluation and mitigation. It has not been trained with human feedback and may generate toxic, offensive, or incorrect information. Users should be aware of potential biases inherited from its web-sourced training data.