Overview

Granite-7b-base is a 7 billion parameter base large language model developed by IBM Research, released under an Apache-2.0 license. It is an open reference implementation of Meta's Llama2-7B architecture, featuring Multi-Head Attention (MHA) and a 4k token context length. The model was pre-trained from scratch on 2 trillion tokens of IBM-curated data, with detailed data sources and sampling proportions provided for transparency.

Key Capabilities

Llama2-7B Architecture Replica: Built to mirror the Llama2-7B base variant, offering a familiar and robust foundation.
Extensive Pre-training: Trained on 2 trillion tokens, including diverse datasets like Common Crawl, GitHub_Clean, Wikipedia, USPTO, PubMed Central, arXiv, StackExchange, PG19, and Webhose.
Open Source Commitment: Released by IBM Research under an Apache-2.0 license, fostering open innovation and community use.
Performance: Shows comparable or slightly improved performance over Llama2-7B on some LM-eval Harness benchmarks, such as MMLU (0.50 vs 0.47 for 5-shot weighted avg).

Good for

Research and Development: Ideal for researchers and developers looking for a transparently trained base model to experiment with or build upon.
Foundation for Fine-tuning: Suitable as a robust base model for further fine-tuning on specific downstream tasks.
Understanding LLM Training: Provides a clear example of data curation and training methodology, with detailed data source attribution.

Limitations

As a base model, Granite-7b-base has not undergone safety alignment and may produce problematic outputs. Users should implement adequate safeguards and be aware of potential risks like disinformation generation or hallucination, especially in ungrounded scenarios.

Overview

Overview

Key Capabilities

Good for

Limitations

Full Model Card (README)