Model Overview

Granite-3.1-3B-A800M-Base is a 3.3 billion parameter sparse Mixture of Experts (MoE) language model developed by the IBM Granite Team. It significantly extends its context length to 128K tokens, a key differentiator achieved through a progressive training strategy over approximately 500 billion tokens. This model is built on a decoder-only transformer architecture featuring Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss.

Key Capabilities

Extended Context Window: Supports an impressive 128K token context length, enabling processing of very long documents and conversations.
Multilingual Support: Capable of handling English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese, with potential for fine-tuning in other languages.
General Text-to-Text Generation: Designed for a wide array of tasks including summarization, text classification, information extraction, and question-answering.
Base Model Flexibility: Serves as a robust foundation for creating specialized models tailored to specific application scenarios.

Intended Use Cases

This model is well-suited for developers and researchers looking for a powerful base model with an exceptionally long context window. It can be applied to:

Applications requiring deep understanding and generation from extensive textual data.
Developing custom solutions for summarization, classification, and Q&A.
Fine-tuning for domain-specific tasks or additional languages.

Limitations and Ethical Considerations

As a base model, Granite-3.1-3B-A800M-Base has not undergone safety alignment and may produce problematic outputs. Users should be aware of potential risks such as bias, misinformation, and the possibility of hallucination, especially in smaller models. Responsible and ethical deployment is strongly encouraged.

Overview

Model Overview

Key Capabilities

Intended Use Cases

Limitations and Ethical Considerations

Full Model Card (README)