ibm-granite/granite-7b-base
Granite-7b-base is a 7 billion parameter base large language model developed by IBM Research, replicating Meta's Llama2-7B architecture with MHA. Pre-trained on 2 trillion tokens with a 4k token context length, it is primarily English-language focused. This model serves as an open reference implementation, offering a transparently curated dataset for community and commercial use.
Loading preview...
Overview
Granite-7b-base is a 7 billion parameter base large language model developed by IBM Research, released under an Apache-2.0 license. It is an open reference implementation of Meta's Llama2-7B architecture, featuring Multi-Head Attention (MHA) and a 4k token context length. The model was pre-trained from scratch on 2 trillion tokens of IBM-curated data, with detailed data sources and sampling proportions provided for transparency.
Key Capabilities
- Llama2-7B Architecture Replica: Built to mirror the Llama2-7B base variant, offering a familiar and robust foundation.
- Extensive Pre-training: Trained on 2 trillion tokens, including diverse datasets like Common Crawl, GitHub_Clean, Wikipedia, USPTO, PubMed Central, arXiv, StackExchange, PG19, and Webhose.
- Open Source Commitment: Released by IBM Research under an Apache-2.0 license, fostering open innovation and community use.
- Performance: Shows comparable or slightly improved performance over Llama2-7B on some LM-eval Harness benchmarks, such as MMLU (0.50 vs 0.47 for 5-shot weighted avg).
Good for
- Research and Development: Ideal for researchers and developers looking for a transparently trained base model to experiment with or build upon.
- Foundation for Fine-tuning: Suitable as a robust base model for further fine-tuning on specific downstream tasks.
- Understanding LLM Training: Provides a clear example of data curation and training methodology, with detailed data source attribution.
Limitations
As a base model, Granite-7b-base has not undergone safety alignment and may produce problematic outputs. Users should implement adequate safeguards and be aware of potential risks like disinformation generation or hallucination, especially in ungrounded scenarios.