m-a-p/OpenLLaMA-Reproduce-1409.29B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 1, 2024Architecture:Transformer Cold

The m-a-p/OpenLLaMA-Reproduce-1409.29B is a 7 billion parameter language model, part of the OpenLLaMA family, designed for high-quality, contextually relevant text predictions. It was trained on a diverse composite dataset including web data, scholarly articles, and literature to ensure broad domain coverage. This model is optimized for general-purpose text generation and understanding across various topics.

Loading preview...

OpenLLaMA 7Bv2 Model Overview

This model, OpenLLaMA 7Bv2, is a 7 billion parameter language model developed to provide high-quality and contextually relevant text predictions. It distinguishes itself through its training on a highly diverse composite dataset, which includes a wide array of sources to ensure broad applicability and robust understanding across various domains.

Key Training Data Sources

The model's training leveraged a rich and varied dataset, contributing to its comprehensive knowledge base:

  • Falcon refined-web dataset: For general web-crawled information.
  • starcoder datasets: Likely contributing to code-related understanding.
  • Wikipedia: Providing encyclopedic knowledge.
  • arXiv: Incorporating academic papers for scientific understanding.
  • Extensive book collections: Covering multiple genres for broad literary context.
  • Stack Exchange data: Curated by RedPajama, offering question-answer pairs and technical discussions.

Training Procedure Highlights

The training process for OpenLLaMA 7Bv2 was meticulously designed for efficiency and performance:

  • Learning Rate: Utilized a maximum learning rate of 3e-4 and a minimum of 3e-5.
  • Batch Size: Employed a substantial batch size of 4 million tokens.
  • Learning Rate Scheduler: The scheduling strategy closely mirrors that used in Llama2, ensuring stable and optimal convergence during training.