Model Overview

Granite-3.0-1B-A400M-Base is a 1.3 billion parameter decoder-only language model from IBM's Granite Team, designed for diverse text-to-text generation tasks. It employs a sparse Mixture of Experts (MoE) transformer architecture, featuring 32 experts and 8 active experts (MoE TopK), resulting in 400 million active parameters. The model was trained using a two-stage strategy on a total of 10 trillion tokens, sourced from various domains like web, code, academic, books, and math data, with the second stage focusing on high-quality, curated data to enhance task-specific performance.

Key Capabilities

Efficient Performance: Utilizes a sparse MoE architecture with 400M active parameters for efficient inference.
Broad Task Support: Capable of handling summarization, text classification, extraction, and question-answering.
Multilingual Support: Supports 12 languages including English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.
Foundation Model: Serves as a strong baseline for creating specialized models through fine-tuning.
Extensive Training: Trained on 10 trillion tokens from diverse domains, ensuring robust general-purpose understanding.

Good For

General Text-to-Text Generation: Ideal for common LLM applications like summarization and classification.
Specialized Model Development: Suitable as a base model for fine-tuning on specific application scenarios.
Multilingual Applications: Useful for tasks requiring processing in the 12 supported languages.
Research and Development: Provides a robust MoE architecture for exploring efficient language model applications.

Overview

Model Overview

Key Capabilities

Good For

Full Model Card (README)