m-a-p/OpenLLaMA-Reproduce-1023.41B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Apr 1, 2024Architecture:Transformer Cold

OpenLLaMA-Reproduce-1023.41B is a 7 billion parameter language model from m-a-p, designed for high-quality, contextually relevant text predictions. It is trained on a diverse composite dataset including web-crawled data, scholarly articles, and question-answer pairs. This model is notable for its broad domain coverage and applicability, leveraging a training procedure that closely follows the Llama2 learning rate scheduling strategy.

Loading preview...

OpenLLaMA 7Bv2 Model Overview

OpenLLaMA 7Bv2 is a 7 billion parameter language model developed by m-a-p, focused on generating high-quality, contextually relevant text. It distinguishes itself through its comprehensive training data and optimized training methodology.

Key Capabilities & Training Details

  • Diverse Training Data: The model was trained on a rich composite dataset, ensuring broad domain understanding. This includes:
    • Falcon refined-web dataset
    • starcoder datasets
    • Wikipedia for encyclopedic knowledge
    • arXiv for scientific understanding
    • A vast collection of books
    • Stack Exchange data curated by RedPajama
  • Optimized Training Procedure: The training utilized a maximum learning rate of 3e-4 and a minimum of 3e-5, with a batch size of 4 million tokens. The learning rate scheduler closely mirrors the strategy employed in Llama2, contributing to stable and efficient convergence.

Good For

  • Applications requiring broad domain knowledge and contextually relevant text generation.
  • Tasks benefiting from a model trained on a diverse mix of web data, academic papers, and structured Q&A.