itsliupeng/openllama-7b-base
itsliupeng/openllama-7b-base is a 7 billion parameter OpenLLaMA model, a reproduction of the OpenLLaMA architecture, trained in Bfloat16 on 128 H100 GPUs. It was pretrained on a nearly 1 trillion token dataset including Falcon, Starcoder, and RedPajama data (wikipedia, arxiv, books, stackexchange). This model is designed for general language understanding and generation tasks, offering a base model for further fine-tuning or research. Its training on a diverse, large-scale dataset positions it as a capable foundation for various NLP applications.
Loading preview...
Model Overview
itsliupeng/openllama-7b-base is a 7 billion parameter language model, representing a reproduction of the OpenLLaMA architecture. This model was trained using 128 H100 GPUs in Bfloat16 precision, focusing on efficient and scalable pretraining.
Training Details
The model's pretraining dataset comprised nearly 1 trillion tokens, drawing from a diverse mix including Falcon, Starcoder, and components of RedPajama (specifically Wikipedia, ArXiv, Books, and StackExchange). The training process involved a single epoch, utilizing 2000 warm-up steps and a cosine learning rate schedule, starting at 3e-5 with a 4M batch size.
Performance Benchmarks
Evaluated on the Open LLM Leaderboard, itsliupeng/openllama-7b-base demonstrates a balanced performance across various tasks. Key scores include:
- Avg.: 47.09
- AI2 Reasoning Challenge (25-Shot): 46.16
- HellaSwag (10-Shot): 76.40
- MMLU (5-Shot): 42.82
- TruthfulQA (0-shot): 36.65
- Winogrande (5-shot): 70.88
- GSM8k (5-shot): 9.63
Use Cases
This model is suitable as a foundational language model for a wide range of natural language processing tasks. Its base nature makes it ideal for researchers and developers looking to fine-tune for specific applications, explore language understanding, or generate text based on its extensive pretraining.