m-a-p/OpenLLaMA-Reproduce-872.42B
OpenLLaMA 7Bv2 is a 7 billion parameter language model developed by m-a-p, designed for high-quality, contextually relevant text predictions. It was trained on a diverse composite dataset including web-crawled data, scholarly articles, and literature, ensuring broad domain coverage. This model is optimized for general-purpose text generation and understanding across various topics, leveraging a 4096-token context length.
Loading preview...
OpenLLaMA 7Bv2 Overview
OpenLLaMA 7Bv2 is a 7 billion parameter language model focused on generating high-quality, contextually relevant text. It distinguishes itself through its comprehensive training on a diverse composite dataset, which includes web-crawled data, scholarly articles, and a wide array of literature and question-answer pairs.
Key Training Details
The model's training incorporated a rich dataset comprising:
- Falcon refined-web dataset: For broad internet knowledge.
- Starcoder datasets: Likely contributing to code-related understanding.
- Wikipedia: Providing encyclopedic knowledge.
- arXiv: For scientific and academic comprehension.
- Extensive book collections: Covering multiple genres.
- RedPajama's Stack Exchange data: Enhancing question-answering capabilities.
The training procedure utilized a maximum learning rate of 3e-4, a minimum of 3e-5, and a substantial batch size of 4 million tokens. Its learning rate scheduling strategy closely mirrors that of Llama2, aiming for optimal convergence.
Potential Use Cases
Given its diverse training data, OpenLLaMA 7Bv2 is well-suited for:
- General text generation: Creating coherent and contextually appropriate text for various applications.
- Content summarization: Condensing information from diverse sources.
- Question answering: Providing informed responses based on its broad knowledge base.
- Research assistance: Aiding in understanding academic and scientific texts.