OpenLLaMA: An Open Reproduction of LLaMA
OpenLLaMA, developed by openlm-research, is a permissively licensed open-source reproduction of Meta AI's LLaMA large language model. This 7 billion parameter model (along with 3B and 13B variants) was trained on 1 trillion tokens from the RedPajama dataset, which is a reproduction of the LLaMA training dataset. The project meticulously followed the original LLaMA paper's preprocessing steps, model architecture, context length, training steps, learning rate schedule, and optimizer, with the primary difference being the use of the RedPajama dataset.
Key Capabilities & Features
- LLaMA Architecture Reproduction: Faithfully replicates the LLaMA model architecture and training methodology.
- Permissive Licensing: Released under the Apache 2.0 license, allowing for broad usage and integration.
- Comparable Performance: Achieves performance comparable to the original LLaMA 7B and GPT-J 6B across a wide range of evaluation tasks, and in some cases, outperforms them.
- Hugging Face Transformers Integration: Weights are available in PyTorch format and can be easily loaded using the Hugging Face transformers library.
- Scratch-Trained Tokenizer: Unlike the original LLaMA, OpenLLaMA's tokenizer and weights are trained entirely from scratch, removing the dependency on obtaining the original LLaMA tokenizer.
When to Use This Model
OpenLLaMA is suitable for developers and researchers seeking a powerful, openly licensed language model that mirrors the capabilities of the original LLaMA. Its strong performance on general language tasks makes it a versatile choice for applications requiring text generation, question answering, and other natural language processing tasks, especially when an Apache 2.0 licensed model is preferred.