itsliupeng/openllama-7b-base

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Dec 8, 2023License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

itsliupeng/openllama-7b-base is a 7 billion parameter OpenLLaMA model, a reproduction of the OpenLLaMA architecture, trained in Bfloat16 on 128 H100 GPUs. It was pretrained on a nearly 1 trillion token dataset including Falcon, Starcoder, and RedPajama data (wikipedia, arxiv, books, stackexchange). This model is designed for general language understanding and generation tasks, offering a base model for further fine-tuning or research. Its training on a diverse, large-scale dataset positions it as a capable foundation for various NLP applications.

Loading preview...

Model Overview

itsliupeng/openllama-7b-base is a 7 billion parameter language model, representing a reproduction of the OpenLLaMA architecture. This model was trained using 128 H100 GPUs in Bfloat16 precision, focusing on efficient and scalable pretraining.

Training Details

The model's pretraining dataset comprised nearly 1 trillion tokens, drawing from a diverse mix including Falcon, Starcoder, and components of RedPajama (specifically Wikipedia, ArXiv, Books, and StackExchange). The training process involved a single epoch, utilizing 2000 warm-up steps and a cosine learning rate schedule, starting at 3e-5 with a 4M batch size.

Performance Benchmarks

Evaluated on the Open LLM Leaderboard, itsliupeng/openllama-7b-base demonstrates a balanced performance across various tasks. Key scores include:

  • Avg.: 47.09
  • AI2 Reasoning Challenge (25-Shot): 46.16
  • HellaSwag (10-Shot): 76.40
  • MMLU (5-Shot): 42.82
  • TruthfulQA (0-shot): 36.65
  • Winogrande (5-shot): 70.88
  • GSM8k (5-shot): 9.63

Use Cases

This model is suitable as a foundational language model for a wide range of natural language processing tasks. Its base nature makes it ideal for researchers and developers looking to fine-tune for specific applications, explore language understanding, or generate text based on its extensive pretraining.