ibm-granite/granite-3.0-1b-a400m-instruct
Granite-3.0-1B-A400M-Instruct is an IBM-developed 1 billion parameter instruction-tuned language model based on a decoder-only sparse Mixture of Experts (MoE) transformer architecture. It is fine-tuned from Granite-3.0-1B-A400M-Base using diverse techniques including supervised finetuning and reinforcement learning. This model excels at general instruction following, summarization, text classification, question-answering, and code-related tasks, supporting 12 languages for building AI assistants.
Loading preview...
Granite-3.0-1B-A400M-Instruct: A Sparse MoE Language Model
Granite-3.0-1B-A400M-Instruct is a 1 billion parameter instruction-tuned model developed by the Granite Team at IBM. It is built upon a decoder-only sparse Mixture of Experts (MoE) transformer architecture, distinguishing it from dense models. Key architectural components include Fine-grained Experts, Dropless Token Routing, and Load Balancing Loss, enabling efficient performance with 400 million active parameters despite its 1.3 billion total parameters. The model was trained on 10 trillion tokens and fine-tuned using a combination of open-source instruction datasets, internal synthetic data, and human-curated data, incorporating supervised finetuning and reinforcement learning for alignment.
Key Capabilities
- Multilingual Support: Handles English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.
- Instruction Following: Designed to respond to general instructions for AI assistants.
- Diverse NLP Tasks: Proficient in summarization, text classification, text extraction, question-answering, and Retrieval Augmented Generation (RAG).
- Code & Function-calling: Capable of handling code-related tasks and function-calling scenarios.
Good For
- Building AI assistants for various domains, including business applications.
- Tasks requiring efficient processing and generation in a multilingual context.
- Applications benefiting from a sparse MoE architecture for optimized resource usage.