LLM360/Amber: An Open-Source Training Resource
LLM360/Amber is a 7 billion parameter English language model utilizing the LLaMA architecture, developed by the LLM360 initiative. Its primary purpose is to foster transparency and accessibility in LLM training, making all 360 intermediate model checkpoints, the full pre-training dataset, source code, and configurations publicly available under an Apache 2.0 license.
Key Characteristics & Transparency
- Architecture: LLaMA-7B equivalent with 6.7B total parameters and a 4096 token context length.
- Open Access: Provides unprecedented access to the entire training trajectory, including 360 checkpoints, allowing researchers to analyze model behavior at various stages.
- Comprehensive Data: The fully processed pre-training data, totaling over 1.2 trillion tokens from subsets like Refined-Web, StarCoder, and C4, is also accessible.
- Training Details: Detailed training logs and evaluation results are available via a dedicated W&B project page, alongside links to training code and data preparation scripts.
Performance & Intended Use
Amber is explicitly stated as not a state-of-the-art model, with evaluation scores such as ARC-C 42.57, HellaSwag 73.91, and MMLU 28.53. Its value lies not in benchmark-topping performance, but in its role as a research and educational tool for understanding the intricacies of large language model training and development. It serves as a foundational resource for the community to deepen their understanding of LLMs.