m-a-p/OpenLLaMA-Reproduce-1409.29B
The m-a-p/OpenLLaMA-Reproduce-1409.29B is a 7 billion parameter language model, part of the OpenLLaMA family, designed for high-quality, contextually relevant text predictions. It was trained on a diverse composite dataset including web data, scholarly articles, and literature to ensure broad domain coverage. This model is optimized for general-purpose text generation and understanding across various topics.
Loading preview...
OpenLLaMA 7Bv2 Model Overview
This model, OpenLLaMA 7Bv2, is a 7 billion parameter language model developed to provide high-quality and contextually relevant text predictions. It distinguishes itself through its training on a highly diverse composite dataset, which includes a wide array of sources to ensure broad applicability and robust understanding across various domains.
Key Training Data Sources
The model's training leveraged a rich and varied dataset, contributing to its comprehensive knowledge base:
- Falcon refined-web dataset: For general web-crawled information.
- starcoder datasets: Likely contributing to code-related understanding.
- Wikipedia: Providing encyclopedic knowledge.
- arXiv: Incorporating academic papers for scientific understanding.
- Extensive book collections: Covering multiple genres for broad literary context.
- Stack Exchange data: Curated by RedPajama, offering question-answer pairs and technical discussions.
Training Procedure Highlights
The training process for OpenLLaMA 7Bv2 was meticulously designed for efficiency and performance:
- Learning Rate: Utilized a maximum learning rate of 3e-4 and a minimum of 3e-5.
- Batch Size: Employed a substantial batch size of 4 million tokens.
- Learning Rate Scheduler: The scheduling strategy closely mirrors that used in Llama2, ensuring stable and optimal convergence during training.