openlm-research/open_llama_13b
OpenLLaMA 13B is a 13 billion parameter causal language model developed by openlm-research, serving as an open-source reproduction of Meta AI's LLaMA architecture. Trained on 1 trillion tokens from the RedPajama dataset, it aims to replicate LLaMA's training methodology and performance. This model provides a permissively licensed alternative for research and development, demonstrating comparable performance to the original LLaMA and GPT-J across various benchmarks.
Loading preview...
OpenLLaMA 13B: An Open Reproduction of LLaMA
OpenLLaMA 13B is a 13 billion parameter large language model developed by openlm-research, designed as an open-source, permissively licensed (Apache 2.0) reproduction of Meta AI's LLaMA architecture. It is part of a series including 3B and 7B models, all trained on 1 trillion tokens from the RedPajama dataset.
Key Capabilities & Features
- Architecture Replication: Follows the exact model architecture, context length, training steps, learning rate schedule, and optimizer as the original LLaMA paper.
- Dataset: Trained on the RedPajama dataset, an open-source reproduction of the LLaMA training dataset.
- Performance: Achieves comparable performance to the original LLaMA and GPT-J across a wide range of tasks, and in some cases, outperforms them, as evaluated using lm-evaluation-harness.
- Flexible Usage: Weights are released in both EasyLM and PyTorch formats, compatible with the Hugging Face transformers library.
- Training Framework: Developed using EasyLM, a JAX-based training pipeline, leveraging cloud TPU-v4s.
Good For
- Research and Development: Ideal for researchers and developers seeking an open-source, LLaMA-like model with a permissive license.
- Benchmarking: Useful for comparing against other open-source models and understanding the impact of different training datasets.
- Experimentation: Provides a solid foundation for fine-tuning and further development of large language models.