Model Overview

Granite-3.1-8B-Base is an 8.1 billion parameter large language model developed by the Granite Team at IBM. It is built upon a decoder-only dense transformer architecture, incorporating features like GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. A key differentiator of this model is its significantly extended context length of 128K tokens, a substantial increase from its predecessor's 4K, achieved through a progressive training strategy over approximately 500 billion tokens.

Key Capabilities & Features

Extended Context Window: Supports a 128K token context length, making it highly suitable for processing and generating long documents.
Multilingual Support: Capable of handling English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese, with potential for fine-tuning in other languages.
General-Purpose Base Model: Designed to excel in various text-to-text generation tasks including summarization, text classification, information extraction, and question-answering.
Foundation for Specialization: Serves as a strong baseline model for developers to create more specialized models tailored to specific application scenarios.

Performance Highlights

On the HuggingFace Open LLM Leaderboard V1, Granite-3.1-8B-Base achieved an average score of 66.85, with notable scores such as 63.99 on ARC-Challenge and 63.45 on MMLU. For the V2 leaderboard, it scored an average of 20.07, including 42.21 on IFEval and 26.02 on BBH.

Intended Use Cases

This model is well-suited for applications requiring deep understanding and generation from extensive textual inputs. It can be used for:

Long-form content summarization: Condensing lengthy articles, reports, or legal documents.
Complex question-answering: Extracting precise answers from large bodies of text.
Document analysis: Classifying and extracting information from long documents.
Building specialized LLMs: Providing a robust foundation for further fine-tuning on domain-specific datasets.

Overview

Model Overview

Key Capabilities & Features

Performance Highlights

Intended Use Cases

Full Model Card (README)