m-a-p/OpenLLaMA-Reproduce-436.21B
m-a-p/OpenLLaMA-Reproduce-436.21B is a 7 billion parameter language model, part of the OpenLLaMA family, designed for high-quality, contextually relevant text predictions. It was trained on a diverse composite dataset including web-crawled data, scholarly articles, and question-answer pairs, ensuring broad domain coverage. This model is optimized for general-purpose text generation and understanding across various topics, leveraging a training strategy similar to Llama2 for optimal convergence.
Loading preview...
OpenLLaMA 7Bv2 Overview
m-a-p/OpenLLaMA-Reproduce-436.21B is a 7 billion parameter language model, focusing on delivering high-quality and contextually relevant text predictions. It is built upon a diverse composite dataset to ensure broad applicability across various domains.
Key Capabilities
- Broad Domain Understanding: Trained on a comprehensive dataset including web data (Falcon refined-web), code (starcoder datasets), encyclopedic knowledge (Wikipedia), scientific papers (arXiv), and a vast collection of books and Stack Exchange data.
- Contextual Text Generation: Designed to produce contextually relevant text outputs, making it suitable for a wide range of language tasks.
- Optimized Training: Utilizes a maximum learning rate of 3e-4, a minimum learning rate of 3e-5, and a batch size of 4 million tokens. The learning rate scheduling closely follows the strategy used in Llama2 for efficient and stable convergence.
Good For
- General-purpose text generation and completion.
- Applications requiring broad knowledge across web content, scientific, and literary domains.
- Tasks benefiting from a model trained with a robust and diverse data mixture.