itsliupeng/openllama-7b-icl
The itsliupeng/openllama-7b-icl model is a 7 billion parameter OpenLLaMA reproduction, trained with a specific in-context learning methodology. Developed by itsliupeng, it utilizes a diverse pretraining dataset including Falcon, Starcoder, Wikipedia, ArXiv, books, and StackExchange, totaling nearly 1 trillion tokens. This model is distinguished by its unique organization of Falcon documents, following the approach detailed in the 'in-context learning' arXiv paper, making it suitable for tasks benefiting from enhanced in-context learning capabilities.
Loading preview...
Model Overview
The itsliupeng/openllama-7b-icl is a 7 billion parameter OpenLLaMA model, a reproduction effort by itsliupeng. It was trained using 128 H100 GPUs with Bfloat16 precision, distinguishing itself through a specific in-context learning (ICL) methodology.
Key Characteristics
- Architecture: Based on the OpenLLaMA 7B architecture.
- Training Data: Pretrained on a substantial dataset of nearly 1 trillion tokens, comprising Falcon, Starcoder, Wikipedia, ArXiv, books, and StackExchange from RedPajama.
- Training Methodology: Trained over a single epoch with 2000 warm-up steps and a cosine learning rate schedule (3e-5, 4M batch size).
- Unique Feature: The primary difference from the base OpenLLaMA 7B model lies in the organization of Falcon documents, which strictly adheres to the methodology outlined in the in-context learning arXiv paper.
Potential Use Cases
- Research into In-Context Learning: Ideal for researchers exploring the impact and effectiveness of specific in-context learning strategies.
- Applications requiring ICL: Suitable for tasks where the unique document organization for ICL could offer performance advantages.
- General Language Understanding: Can be used for a broad range of natural language processing tasks, leveraging its extensive pretraining.