ibm-granite/granite-3.1-3b-a800m-instruct
Granite-3.1-3B-A800M-Instruct is a 3 billion parameter instruction-tuned causal language model developed by IBM's Granite Team. This model is optimized for long-context tasks, multilingual dialogue, and general instruction following, leveraging a decoder-only dense transformer architecture with a 32,768 token context length. It is fine-tuned using a combination of open-source and synthetic datasets, making it suitable for building AI assistants for business applications, including summarization, Q&A, and code-related tasks.
Loading preview...
Model Overview
Granite-3.1-3B-A800M-Instruct is a 3 billion parameter instruction-tuned model from IBM's Granite Team, designed for long-context applications. It is built upon a decoder-only dense transformer architecture, incorporating features like Grouped-query Attention (GQA), Rotary Position Embeddings (RoPE), SwiGLU activation, and RMSNorm. The model was fine-tuned using a mix of permissively licensed open-source datasets and internally generated synthetic data specifically targeting long-context problems.
Key Capabilities
- Long-context tasks: Excels in processing and understanding extensive documents for summarization and question-answering.
- Multilingual support: Capable of handling dialogue in English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.
- General instruction following: Designed to respond to a wide range of instructions for AI assistant applications.
- Diverse NLP tasks: Supports summarization, text classification, extraction, question-answering, Retrieval Augmented Generation (RAG), and code-related tasks.
Training and Architecture
The model's training involved supervised fine-tuning, reinforcement learning for alignment, and model merging techniques. It was trained on IBM's Blue Vela supercomputing cluster, utilizing NVIDIA H100 GPUs. The architecture includes 32 layers, 24 attention heads, and a 128K sequence length, with 3.3 billion parameters and 800 million active parameters, trained on 10 trillion tokens.
Intended Use Cases
This model is suitable for developing AI assistants across various domains, particularly for business applications requiring robust performance in:
- Summarizing long documents or meetings.
- Performing question-answering over extensive texts.
- Code generation and related tasks.
- Multilingual conversational agents.
- Function-calling tasks.