Model Overview

Granite-3.1-2B-Base is a 2.5 billion parameter decoder-only dense transformer model from IBM, part of the Granite 3.1 series. It significantly extends its context length to 128K tokens, a substantial increase from its predecessor, achieved through a progressive training strategy that incrementally adjusted RoPE theta. This long-context pre-training involved approximately 500 billion tokens.

Key Capabilities

Extended Context Window: Supports a 128K token context length, making it suitable for tasks requiring extensive contextual understanding.
Multilingual Support: Trained to support 12 languages, including English, German, Spanish, French, Japanese, and Chinese, with potential for fine-tuning in additional languages.
Versatile Text Generation: Capable of handling various text-to-text generation tasks such as summarization, text classification, information extraction, and question-answering.
Robust Architecture: Utilizes a decoder-only dense transformer architecture incorporating GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.

Intended Use Cases

This base model is well-suited for developers looking to:

Implement long-context tasks like detailed document summarization or complex question-answering over large texts.
Develop specialized models for specific application scenarios by using Granite-3.1-2B-Base as a foundational baseline.
Address generative AI tasks in supported languages, leveraging its broad training data from web, code, academic, and book sources.

Limitations and Considerations

As a base model, Granite-3.1-2B-Base has not undergone safety alignment and may produce problematic outputs. Users should be aware of potential risks such as bias, misinformation, and hallucination, especially given its smaller size compared to larger models. Ethical and responsible deployment is strongly encouraged.

Overview

Model Overview

Key Capabilities

Intended Use Cases

Limitations and Considerations

Full Model Card (README)