The elinas/llama-13b-hf-transformers-4.29 model is a 13 billion parameter auto-regressive language model, based on the Transformer architecture, developed by the FAIR team of Meta AI. This version specifically uses weights converted with the latest Hugging Face Transformers library and LlamaTokenizerFast. Primarily intended for research on large language models, it supports exploring applications like question answering and natural language understanding, with a focus on understanding model capabilities and limitations.
Loading preview...
Model Overview
elinas/llama-13b-hf-transformers-4.29 is a 13 billion parameter LLaMA model, developed by Meta AI's FAIR team. This specific model utilizes weights converted with the latest Hugging Face transformers library and LlamaTokenizerFast for improved compatibility. LLaMA is a foundational, auto-regressive language model built on the Transformer architecture, trained between December 2022 and February 2023.
Key Capabilities & Intended Use
- Research Focus: Primarily designed for research into large language models, including exploring applications such as question answering, natural language understanding, and reading comprehension.
- Understanding Limitations: Used for evaluating and mitigating biases, risks, toxic content generation, and hallucinations inherent in LLMs.
- Multilingual Data: While predominantly English, the training data included 20 languages (bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk) for Wikipedia and Books domains.
Performance & Training
The 13B model was trained on 1 trillion tokens, with a batch size of 4 million. It demonstrates competitive performance on various reasoning tasks, achieving 78.1% on BoolQ, 79.2% on HellaSwag, and 94% on COPA. The training dataset comprised CCNet (67%), C4 (15%), GitHub (4.5%), Wikipedia (4.5%), Books (4.5%), ArXiv (2.5%), and Stack Exchange (2%).
Important Considerations
As a base model, LLaMA is not fine-tuned with human feedback and may generate toxic, offensive, or incorrect information. It is explicitly out-of-scope for direct use in downstream applications without further risk evaluation and mitigation. Researchers are encouraged to consult the original paper for detailed information on its architecture, training, and ethical considerations.