m-a-p/OpenLLaMA-Reproduce-654.31B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 1, 2024Architecture:Transformer Cold

OpenLLaMA-Reproduce-654.31B is a 7 billion parameter language model developed by m-a-p, designed for high-quality, contextually relevant text predictions. It was trained on a diverse composite dataset including web-crawled data, scholarly articles, and literature to ensure broad domain coverage. This model is optimized for general-purpose text generation and understanding across various topics.

Loading preview...

OpenLLaMA 7Bv2 Overview

OpenLLaMA 7Bv2 is a 7 billion parameter language model developed by m-a-p, engineered to provide high-quality and contextually relevant text predictions. Its training focused on broad domain coverage, utilizing a diverse composite dataset.

Key Training Details

The model was trained on a rich and varied dataset, ensuring comprehensive knowledge acquisition. This dataset includes:

  • Web-crawled data: Incorporating the Falcon refined-web dataset and starcoder datasets.
  • Encyclopedic knowledge: Contributions from Wikipedia.
  • Scientific understanding: Academic papers sourced from arXiv.
  • Extensive literature: A vast collection of books spanning multiple genres.
  • Curated Q&A: Stack Exchange data, as curated by RedPajama.

The training procedure involved a maximum learning rate of 3e-4 and a minimum of 3e-5, with a batch size of 4 million tokens. The learning rate scheduler closely mirrors the strategy employed in Llama2, facilitating optimal convergence.

Use Cases

This model is well-suited for general text generation tasks, question answering, and applications requiring broad contextual understanding due to its diverse training data.