ibm-granite/granite-3.3-2b-base

TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Apr 9, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Granite-3.3-2B-Base is a 2 billion parameter decoder-only language model developed by IBM, featuring a 128K token context window. It enhances its predecessor by incorporating Fill-in-the-Middle (FIM) support, making it particularly effective for code completion tasks. This model is designed for general text-to-text generation, including summarization, classification, and question-answering, and serves as a robust baseline for specialized applications.

Loading preview...

Granite-3.3-2B-Base: An IBM Language Model for Code Completion and General Text Generation

Granite-3.3-2B-Base, developed by the IBM Granite Team, is a 2 billion parameter decoder-only language model with an extensive 128K token context window. This model significantly improves upon its 3.1 predecessor by integrating Fill-in-the-Middle (FIM) capabilities using specialized tokens, allowing it to generate content conditioned on both prefix and suffix. This feature makes it particularly well-suited for tasks like code completion.

Key Capabilities and Features

  • Fill-in-the-Middle (FIM) Support: Enhanced ability to generate text based on surrounding context, ideal for code completion.
  • Large Context Window: Supports up to 128K tokens, enabling handling of long-context tasks.
  • Multilingual Support: Trained to support English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, and Chinese.
  • Versatile Text Generation: Capable of summarization, text classification, extraction, and question-answering.
  • Base Model for Specialization: Designed to serve as a foundational model for fine-tuning into specialized applications.

Model Architecture and Training

The model is built on a decoder-only dense transformer architecture, incorporating GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. It underwent a three-stage training process on a diverse mix of open-source and proprietary data, including web, code, academic, books, and math data, with later stages focusing on high-quality and synthetic long-context data. The training utilized IBM's Blue Vela supercomputing cluster with NVIDIA H100 GPUs.

Considerations

While versatile, Granite-3.3-2B-Base has not undergone safety alignment and may produce problematic outputs. Users should be aware of potential biases, misinformation, and the ethical implications of LLM use.