m-a-p/OpenLLaMA-Reproduce-1191.18B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 1, 2024Architecture:Transformer Cold

The m-a-p/OpenLLaMA-Reproduce-1191.18B is a 7 billion parameter OpenLLaMA-based language model, trained to provide high-quality, contextually relevant text predictions. It leverages a diverse composite dataset including web-crawled data, scholarly articles, and question-answer pairs for broad domain coverage. This model is designed for general-purpose text generation and understanding tasks, focusing on robust performance across various knowledge domains.

Loading preview...

OpenLLaMA 7Bv2 Model Overview

This model, m-a-p/OpenLLaMA-Reproduce-1191.18B, is a 7 billion parameter language model based on the OpenLLaMA 7Bv2 architecture. It is engineered for high-quality, contextually relevant text predictions, drawing on a comprehensive and diverse training dataset.

Key Capabilities & Training

The model was trained on a composite dataset designed for broad domain coverage, including:

  • Web-crawled data: Utilizing the Falcon refined-web dataset and starcoder datasets.
  • Encyclopedic knowledge: Contributions from Wikipedia.
  • Scientific understanding: Academic papers from arXiv.
  • Diverse literature: A vast collection of books across multiple genres.
  • Curated Q&A: Stack Exchange data curated by RedPajama.

The training procedure involved a maximum learning rate of 3e-4, a minimum learning rate of 3e-5, and a batch size of 4 million tokens. Its learning rate scheduling closely mirrors the strategy used in Llama2, ensuring optimized convergence.

Use Cases

This model is well-suited for general-purpose natural language processing tasks requiring broad knowledge and contextual understanding, such as:

  • Text generation
  • Question answering
  • Content summarization
  • Information extraction