ibm-granite/granite-3.0-2b-base

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Oct 2, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Granite-3.0-2B-Base is a 2.5 billion parameter decoder-only language model developed by IBM, designed for text-to-text generation tasks. It was trained using a two-stage strategy on 12 trillion tokens from diverse domains, including web, code, academic, books, and math data. This model supports 12 languages and is intended for tasks such as summarization, text classification, extraction, and question-answering, also serving as a baseline for specialized models. Its architecture incorporates GQA, RoPE, SwiGLU, RMSNorm, and shared input/output embeddings, with a sequence length of 4096.

Loading preview...

Granite-3.0-2B-Base: A Foundation for Text Generation

Granite-3.0-2B-Base, developed by the IBM Granite Team, is a 2.5 billion parameter decoder-only language model designed for a wide array of text-to-text generation tasks. Released on October 21st, 2024, under an Apache 2.0 license, this model is built on a dense transformer architecture featuring GQA, RoPE, SwiGLU, RMSNorm, and shared input/output embeddings, supporting a sequence length of 4096.

Key Capabilities & Training:

  • Two-Stage Training: The model underwent a two-stage training process, initially on 10 trillion tokens from diverse domains (web, code, academic, books, math), followed by an additional 2 trillion tokens of high-quality, curated data to enhance performance on specific tasks.
  • Multilingual Support: It supports English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese, with potential for fine-tuning for other languages.
  • Versatile Applications: Intended for common LLM use cases like summarization, text classification, extraction, and question-answering. It also serves as a robust baseline for creating specialized models.

Ethical Considerations:

IBM highlights that while suitable for many generative AI tasks, Granite-3.0-2B-Base has not undergone safety alignment and may produce problematic outputs. Users are urged to consider risks such as bias, misinformation, and potential for malicious utilization, and to use the model ethically and responsibly.