Overview

Granite-3.0-2B-Instruct is a 2 billion parameter instruction-tuned language model developed by the Granite Team at IBM. It is built upon a decoder-only dense transformer architecture, incorporating features like GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. The model was fine-tuned from its base version using a combination of open-source instruction datasets, internally collected synthetic data, and human-curated data, employing supervised fine-tuning, reinforcement learning for alignment, and model merging techniques.

Key Capabilities

General Instruction Following: Designed to respond to a wide range of instructions.
Multilingual Support: Supports English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese, with potential for fine-tuning in additional languages.
Diverse NLP Tasks: Proficient in summarization, text classification, text extraction, question-answering, and Retrieval Augmented Generation (RAG).
Code and Function-calling: Capable of handling code-related tasks and function-calling scenarios.

Training and Architecture

The model features a 2048 embedding size, 40 layers, 32 attention heads, and a 4096 sequence length. It was trained on 12 trillion tokens using IBM's Blue Vela supercomputing cluster, which utilizes 100% renewable energy sources. The model's alignment process considered safety, though users are advised to conduct their own safety testing for specific applications.

Intended Use Cases

This model is suitable for building AI assistants across various domains, including business applications, and for multilingual dialog use cases. While it handles multilingual tasks, performance may vary compared to English, and few-shot examples can enhance accuracy for non-English tasks.

Overview

Overview

Key Capabilities

Training and Architecture

Intended Use Cases

Full Model Card (README)