Granite-3.0-2B-Base: A Foundation for Text Generation

Granite-3.0-2B-Base, developed by the IBM Granite Team, is a 2.5 billion parameter decoder-only language model designed for a wide array of text-to-text generation tasks. Released on October 21st, 2024, under an Apache 2.0 license, this model is built on a dense transformer architecture featuring GQA, RoPE, SwiGLU, RMSNorm, and shared input/output embeddings, supporting a sequence length of 4096.

Key Capabilities & Training:

Two-Stage Training: The model underwent a two-stage training process, initially on 10 trillion tokens from diverse domains (web, code, academic, books, math), followed by an additional 2 trillion tokens of high-quality, curated data to enhance performance on specific tasks.
Multilingual Support: It supports English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese, with potential for fine-tuning for other languages.
Versatile Applications: Intended for common LLM use cases like summarization, text classification, extraction, and question-answering. It also serves as a robust baseline for creating specialized models.

Ethical Considerations:

IBM highlights that while suitable for many generative AI tasks, Granite-3.0-2B-Base has not undergone safety alignment and may produce problematic outputs. Users are urged to consider risks such as bias, misinformation, and potential for malicious utilization, and to use the model ethically and responsibly.

Overview

Granite-3.0-2B-Base: A Foundation for Text Generation

Key Capabilities & Training:

Ethical Considerations:

Full Model Card (README)