ibm-granite/granite-3.1-2b-base

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Dec 6, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Granite-3.1-2B-Base is a 2.5 billion parameter decoder-only dense transformer model developed by IBM. It features an extended context length of 128K tokens, achieved through a progressive training strategy with RoPE theta adjustments over approximately 500 billion tokens. This model is designed for text-to-text generation tasks such as summarization, classification, extraction, and question-answering, particularly excelling in long-context scenarios. It supports 12 languages including English, German, and Japanese, and can serve as a baseline for specialized applications.

Loading preview...

Model Overview

Granite-3.1-2B-Base is a 2.5 billion parameter decoder-only dense transformer model from IBM, part of the Granite 3.1 series. It significantly extends its context length to 128K tokens, a substantial increase from its predecessor, achieved through a progressive training strategy that incrementally adjusted RoPE theta. This long-context pre-training involved approximately 500 billion tokens.

Key Capabilities

  • Extended Context Window: Supports a 128K token context length, making it suitable for tasks requiring extensive contextual understanding.
  • Multilingual Support: Trained to support 12 languages, including English, German, Spanish, French, Japanese, and Chinese, with potential for fine-tuning in additional languages.
  • Versatile Text Generation: Capable of handling various text-to-text generation tasks such as summarization, text classification, information extraction, and question-answering.
  • Robust Architecture: Utilizes a decoder-only dense transformer architecture incorporating GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.

Intended Use Cases

This base model is well-suited for developers looking to:

  • Implement long-context tasks like detailed document summarization or complex question-answering over large texts.
  • Develop specialized models for specific application scenarios by using Granite-3.1-2B-Base as a foundational baseline.
  • Address generative AI tasks in supported languages, leveraging its broad training data from web, code, academic, and book sources.

Limitations and Considerations

As a base model, Granite-3.1-2B-Base has not undergone safety alignment and may produce problematic outputs. Users should be aware of potential risks such as bias, misinformation, and hallucination, especially given its smaller size compared to larger models. Ethical and responsible deployment is strongly encouraged.