m-a-p/OpenLLaMA-Reproduce-100.66B
The m-a-p/OpenLLaMA-Reproduce-100.66B is a 7 billion parameter language model, part of the OpenLLaMA family, trained on a diverse composite dataset including web-crawled data, scholarly articles, and literature. It focuses on delivering high-quality, contextually relevant text predictions across broad domains. This model is designed for general-purpose text generation and understanding tasks, leveraging a training procedure similar to Llama2 for optimal convergence.
Loading preview...
OpenLLaMA 7Bv2 Overview
The m-a-p/OpenLLaMA-Reproduce-100.66B, also known as OpenLLaMA 7Bv2, is a 7 billion parameter language model developed by m-a-p. It is engineered to provide high-quality, contextually relevant text predictions by leveraging a comprehensive and diverse training dataset. The model's training methodology incorporates strategies similar to those used in Llama2, ensuring robust performance and efficient convergence.
Key Capabilities & Training
- Diverse Training Data: Trained on a composite dataset that includes the Falcon refined-web dataset, starcoder datasets, Wikipedia, arXiv academic papers, a vast collection of books, and Stack Exchange data curated by RedPajama. This broad data mix ensures wide domain coverage and applicability.
- Optimized Training Procedure: Utilizes a maximum learning rate of 3e-4 and a minimum of 3e-5, with a batch size of 4 million tokens. The learning rate scheduling closely follows the strategy employed in Llama2 for gradual adjustments and optimal convergence.
Good For
- General Text Generation: Capable of producing contextually relevant text across various topics.
- Broad Domain Understanding: Benefits from its diverse training data, making it suitable for tasks requiring knowledge from web content, scientific articles, and literature.
- Research and Development: Provides a foundation for further fine-tuning or research into language model behavior, given its transparent training details.