openlm-research/open_llama_13b

TEXT GENERATIONConcurrency Cost:1Model Size:13BQuant:FP8Ctx Length:4kPublished:Jun 15, 2023License:apache-2.0Architecture:Transformer0.5K Open Weights Cold

OpenLLaMA 13B is a 13 billion parameter causal language model developed by openlm-research, serving as an open-source reproduction of Meta AI's LLaMA architecture. Trained on 1 trillion tokens from the RedPajama dataset, it aims to replicate LLaMA's training methodology and performance. This model provides a permissively licensed alternative for research and development, demonstrating comparable performance to the original LLaMA and GPT-J across various benchmarks.

Loading preview...

OpenLLaMA 13B: An Open Reproduction of LLaMA

OpenLLaMA 13B is a 13 billion parameter large language model developed by openlm-research, designed as an open-source, permissively licensed (Apache 2.0) reproduction of Meta AI's LLaMA architecture. It is part of a series including 3B and 7B models, all trained on 1 trillion tokens from the RedPajama dataset.

Key Capabilities & Features

  • Architecture Replication: Follows the exact model architecture, context length, training steps, learning rate schedule, and optimizer as the original LLaMA paper.
  • Dataset: Trained on the RedPajama dataset, an open-source reproduction of the LLaMA training dataset.
  • Performance: Achieves comparable performance to the original LLaMA and GPT-J across a wide range of tasks, and in some cases, outperforms them, as evaluated using lm-evaluation-harness.
  • Flexible Usage: Weights are released in both EasyLM and PyTorch formats, compatible with the Hugging Face transformers library.
  • Training Framework: Developed using EasyLM, a JAX-based training pipeline, leveraging cloud TPU-v4s.

Good For

  • Research and Development: Ideal for researchers and developers seeking an open-source, LLaMA-like model with a permissive license.
  • Benchmarking: Useful for comparing against other open-source models and understanding the impact of different training datasets.
  • Experimentation: Provides a solid foundation for fine-tuning and further development of large language models.