ibm-granite/granite-3.0-3b-a800m-instruct
Granite-3.0-3B-A800M-Instruct is a 3.3 billion parameter instruction-tuned decoder-only sparse Mixture of Experts (MoE) transformer model developed by IBM. Finetuned from Granite-3.0-3B-A800M-Base-4K, it is designed to respond to general instructions and build AI assistants for various domains, including business applications. The model excels at tasks such as summarization, text classification, question-answering, RAG, code-related tasks, and multilingual dialog use cases, supporting 12 languages.
Loading preview...
Granite-3.0-3B-A800M-Instruct Overview
Granite-3.0-3B-A800M-Instruct is a 3.3 billion parameter instruction-tuned model developed by the IBM Granite Team. It is finetuned from the Granite-3.0-3B-A800M-Base-4K base model using a combination of open-source instruction datasets, internal synthetic datasets, and human-curated data. The model incorporates supervised finetuning, reinforcement learning for alignment, and model merging techniques.
Key Architectural Features
This model is based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture, featuring:
- Fine-grained Experts: Enhances model specialization.
- Dropless Token Routing: Optimizes token distribution across experts.
- Load Balancing Loss: Ensures efficient utilization of experts.
Capabilities and Intended Use
Designed to respond to general instructions, this model is suitable for building AI assistants across multiple domains, including business applications. It supports 12 languages (English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese) and is capable of:
- Summarization
- Text classification and extraction
- Question-answering and Retrieval Augmented Generation (RAG)
- Code-related tasks and function-calling
- Multilingual dialog use cases
Training and Infrastructure
The model was trained on IBM's Blue Vela supercomputing cluster, utilizing NVIDIA H100 GPUs and powered by 100% renewable energy sources. While primarily finetuned with English instruction-response pairs, it also includes multilingual data. Users should perform safety testing and tuning for specific tasks, as performance in non-English languages might benefit from few-shot examples.