Granite-3.0-8B-Instruct Overview

Granite-3.0-8B-Instruct is an 8.1 billion parameter instruction-tuned model developed by the IBM Granite Team. It is built upon the Granite-3.0-8B-Base model and incorporates advanced techniques including supervised fine-tuning, reinforcement learning for alignment, and model merging. The architecture features a decoder-only dense transformer with GQA, RoPE, SwiGLU MLP, RMSNorm, and shared input/output embeddings, supporting a 32,768 token sequence length.

Key Capabilities

General Instruction Following: Designed to respond to a wide range of instructions.
Multilingual Support: Capable of handling dialog in English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.
Diverse NLP Tasks: Proficient in summarization, text classification, text extraction, and question-answering.
Advanced AI Applications: Supports Retrieval Augmented Generation (RAG), code-related tasks, and function-calling.

Training and Development

The model was trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs. Its training data comprises a combination of publicly available instruction datasets with permissive licenses, internally collected synthetic datasets, and a small amount of human-curated data. IBM emphasizes continuous improvement and recommends checking out their latest Granite 3.1 models for updates.

Overview

Granite-3.0-8B-Instruct Overview

Key Capabilities

Training and Development

Full Model Card (README)