Granite-3.3-2B-Base: An IBM Language Model for Code Completion and General Text Generation

Granite-3.3-2B-Base, developed by the IBM Granite Team, is a 2 billion parameter decoder-only language model with an extensive 128K token context window. This model significantly improves upon its 3.1 predecessor by integrating Fill-in-the-Middle (FIM) capabilities using specialized tokens, allowing it to generate content conditioned on both prefix and suffix. This feature makes it particularly well-suited for tasks like code completion.

Key Capabilities and Features

Fill-in-the-Middle (FIM) Support: Enhanced ability to generate text based on surrounding context, ideal for code completion.
Large Context Window: Supports up to 128K tokens, enabling handling of long-context tasks.
Multilingual Support: Trained to support English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, and Chinese.
Versatile Text Generation: Capable of summarization, text classification, extraction, and question-answering.
Base Model for Specialization: Designed to serve as a foundational model for fine-tuning into specialized applications.

Model Architecture and Training

The model is built on a decoder-only dense transformer architecture, incorporating GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. It underwent a three-stage training process on a diverse mix of open-source and proprietary data, including web, code, academic, books, and math data, with later stages focusing on high-quality and synthetic long-context data. The training utilized IBM's Blue Vela supercomputing cluster with NVIDIA H100 GPUs.

Considerations

While versatile, Granite-3.3-2B-Base has not undergone safety alignment and may produce problematic outputs. Users should be aware of potential biases, misinformation, and the ethical implications of LLM use.

Overview

Granite-3.3-2B-Base: An IBM Language Model for Code Completion and General Text Generation

Key Capabilities and Features

Model Architecture and Training

Considerations

Full Model Card (README)