Model Overview

Granite-3.1-3B-A800M-Instruct is a 3 billion parameter instruction-tuned model from IBM's Granite Team, designed for long-context applications. It is built upon a decoder-only dense transformer architecture, incorporating features like Grouped-query Attention (GQA), Rotary Position Embeddings (RoPE), SwiGLU activation, and RMSNorm. The model was fine-tuned using a mix of permissively licensed open-source datasets and internally generated synthetic data specifically targeting long-context problems.

Key Capabilities

Long-context tasks: Excels in processing and understanding extensive documents for summarization and question-answering.
Multilingual support: Capable of handling dialogue in English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.
General instruction following: Designed to respond to a wide range of instructions for AI assistant applications.
Diverse NLP tasks: Supports summarization, text classification, extraction, question-answering, Retrieval Augmented Generation (RAG), and code-related tasks.

Training and Architecture

The model's training involved supervised fine-tuning, reinforcement learning for alignment, and model merging techniques. It was trained on IBM's Blue Vela supercomputing cluster, utilizing NVIDIA H100 GPUs. The architecture includes 32 layers, 24 attention heads, and a 128K sequence length, with 3.3 billion parameters and 800 million active parameters, trained on 10 trillion tokens.

Intended Use Cases

This model is suitable for developing AI assistants across various domains, particularly for business applications requiring robust performance in:

Summarizing long documents or meetings.
Performing question-answering over extensive texts.
Code generation and related tasks.
Multilingual conversational agents.
Function-calling tasks.

Overview

Model Overview

Key Capabilities

Training and Architecture

Intended Use Cases

Full Model Card (README)