ibm-granite/granite-3.1-8b-base
Granite-3.1-8B-Base is an 8.1 billion parameter decoder-only dense transformer model developed by the Granite Team at IBM. It features an extended context length of 128K tokens, achieved through progressive training from an initial 4K context. This model is designed for long-context text-to-text generation tasks such as summarization, classification, extraction, and question-answering, and serves as a robust baseline for specialized applications.
Loading preview...
Model Overview
Granite-3.1-8B-Base is an 8.1 billion parameter large language model developed by the Granite Team at IBM. It is built upon a decoder-only dense transformer architecture, incorporating features like GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. A key differentiator of this model is its significantly extended context length of 128K tokens, a substantial increase from its predecessor's 4K, achieved through a progressive training strategy over approximately 500 billion tokens.
Key Capabilities & Features
- Extended Context Window: Supports a 128K token context length, making it highly suitable for processing and generating long documents.
- Multilingual Support: Capable of handling English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese, with potential for fine-tuning in other languages.
- General-Purpose Base Model: Designed to excel in various text-to-text generation tasks including summarization, text classification, information extraction, and question-answering.
- Foundation for Specialization: Serves as a strong baseline model for developers to create more specialized models tailored to specific application scenarios.
Performance Highlights
On the HuggingFace Open LLM Leaderboard V1, Granite-3.1-8B-Base achieved an average score of 66.85, with notable scores such as 63.99 on ARC-Challenge and 63.45 on MMLU. For the V2 leaderboard, it scored an average of 20.07, including 42.21 on IFEval and 26.02 on BBH.
Intended Use Cases
This model is well-suited for applications requiring deep understanding and generation from extensive textual inputs. It can be used for:
- Long-form content summarization: Condensing lengthy articles, reports, or legal documents.
- Complex question-answering: Extracting precise answers from large bodies of text.
- Document analysis: Classifying and extracting information from long documents.
- Building specialized LLMs: Providing a robust foundation for further fine-tuning on domain-specific datasets.