Overview

Granite-3.1-8B-Instruct is an 8 billion parameter instruction-tuned model developed by the IBM Granite Team. It is fine-tuned from the Granite-3.1-8B-Base model, leveraging a combination of open-source instruction datasets, internally collected synthetic data tailored for long-context problems, and a small amount of human-curated data. The model incorporates supervised finetuning, reinforcement learning for alignment, and model merging techniques.

Key Capabilities

Long-Context Tasks: Designed to handle extensive inputs for tasks like long document summarization and question-answering.
Multilingual Support: Supports 12 languages including English, German, Spanish, French, Japanese, and Chinese, with potential for finetuning in additional languages.
General Instruction Following: Excels at responding to diverse instructions for building AI assistants.
Broad Task Performance: Capable of summarization, text classification, text extraction, question-answering, Retrieval Augmented Generation (RAG), code-related tasks, and function-calling.

Architecture and Performance

Built on a decoder-only dense transformer architecture, Granite-3.1-8B-Instruct utilizes GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. It features a 32768-token sequence length. On the HuggingFace Open LLM Leaderboard V1, it achieved an average score of 71.31 across benchmarks like ARC-Challenge, MMLU, and GSM8K. For V2, it scored an average of 30.55, demonstrating strong performance in areas like IFEval and BBH.

Overview

Overview

Key Capabilities

Architecture and Performance

Full Model Card (README)