itsliupeng/openllama-7b-icl

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 8, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The itsliupeng/openllama-7b-icl model is a 7 billion parameter OpenLLaMA reproduction, trained with a specific in-context learning methodology. Developed by itsliupeng, it utilizes a diverse pretraining dataset including Falcon, Starcoder, Wikipedia, ArXiv, books, and StackExchange, totaling nearly 1 trillion tokens. This model is distinguished by its unique organization of Falcon documents, following the approach detailed in the 'in-context learning' arXiv paper, making it suitable for tasks benefiting from enhanced in-context learning capabilities.

Loading preview...

Model Overview

The itsliupeng/openllama-7b-icl is a 7 billion parameter OpenLLaMA model, a reproduction effort by itsliupeng. It was trained using 128 H100 GPUs with Bfloat16 precision, distinguishing itself through a specific in-context learning (ICL) methodology.

Key Characteristics

  • Architecture: Based on the OpenLLaMA 7B architecture.
  • Training Data: Pretrained on a substantial dataset of nearly 1 trillion tokens, comprising Falcon, Starcoder, Wikipedia, ArXiv, books, and StackExchange from RedPajama.
  • Training Methodology: Trained over a single epoch with 2000 warm-up steps and a cosine learning rate schedule (3e-5, 4M batch size).
  • Unique Feature: The primary difference from the base OpenLLaMA 7B model lies in the organization of Falcon documents, which strictly adheres to the methodology outlined in the in-context learning arXiv paper.

Potential Use Cases

  • Research into In-Context Learning: Ideal for researchers exploring the impact and effectiveness of specific in-context learning strategies.
  • Applications requiring ICL: Suitable for tasks where the unique document organization for ICL could offer performance advantages.
  • General Language Understanding: Can be used for a broad range of natural language processing tasks, leveraging its extensive pretraining.