ibm-granite/granite-3.0-1b-a400m-base

TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Oct 3, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Granite-3.0-1B-A400M-Base is a 1.3 billion parameter decoder-only sparse Mixture of Experts (MoE) language model developed by the Granite Team at IBM. Trained on 10 trillion tokens across two stages, it features 400 million active parameters, making it efficient for text-to-text generation tasks. This model excels in summarization, classification, extraction, and question-answering, supporting 12 languages including English, German, and Chinese.

Loading preview...

Model Overview

Granite-3.0-1B-A400M-Base is a 1.3 billion parameter decoder-only language model from IBM's Granite Team, designed for diverse text-to-text generation tasks. It employs a sparse Mixture of Experts (MoE) transformer architecture, featuring 32 experts and 8 active experts (MoE TopK), resulting in 400 million active parameters. The model was trained using a two-stage strategy on a total of 10 trillion tokens, sourced from various domains like web, code, academic, books, and math data, with the second stage focusing on high-quality, curated data to enhance task-specific performance.

Key Capabilities

  • Efficient Performance: Utilizes a sparse MoE architecture with 400M active parameters for efficient inference.
  • Broad Task Support: Capable of handling summarization, text classification, extraction, and question-answering.
  • Multilingual Support: Supports 12 languages including English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.
  • Foundation Model: Serves as a strong baseline for creating specialized models through fine-tuning.
  • Extensive Training: Trained on 10 trillion tokens from diverse domains, ensuring robust general-purpose understanding.

Good For

  • General Text-to-Text Generation: Ideal for common LLM applications like summarization and classification.
  • Specialized Model Development: Suitable as a base model for fine-tuning on specific application scenarios.
  • Multilingual Applications: Useful for tasks requiring processing in the 12 supported languages.
  • Research and Development: Provides a robust MoE architecture for exploring efficient language model applications.