ibm-granite/granite-3.3-2b-base
Granite-3.3-2B-Base is a 2 billion parameter decoder-only language model developed by IBM, featuring a 128K token context window. It enhances its predecessor by incorporating Fill-in-the-Middle (FIM) support, making it particularly effective for code completion tasks. This model is designed for general text-to-text generation, including summarization, classification, and question-answering, and serves as a robust baseline for specialized applications.
Loading preview...
Granite-3.3-2B-Base: An IBM Language Model for Code Completion and General Text Generation
Granite-3.3-2B-Base, developed by the IBM Granite Team, is a 2 billion parameter decoder-only language model with an extensive 128K token context window. This model significantly improves upon its 3.1 predecessor by integrating Fill-in-the-Middle (FIM) capabilities using specialized tokens, allowing it to generate content conditioned on both prefix and suffix. This feature makes it particularly well-suited for tasks like code completion.
Key Capabilities and Features
- Fill-in-the-Middle (FIM) Support: Enhanced ability to generate text based on surrounding context, ideal for code completion.
- Large Context Window: Supports up to 128K tokens, enabling handling of long-context tasks.
- Multilingual Support: Trained to support English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, and Chinese.
- Versatile Text Generation: Capable of summarization, text classification, extraction, and question-answering.
- Base Model for Specialization: Designed to serve as a foundational model for fine-tuning into specialized applications.
Model Architecture and Training
The model is built on a decoder-only dense transformer architecture, incorporating GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. It underwent a three-stage training process on a diverse mix of open-source and proprietary data, including web, code, academic, books, and math data, with later stages focusing on high-quality and synthetic long-context data. The training utilized IBM's Blue Vela supercomputing cluster with NVIDIA H100 GPUs.
Considerations
While versatile, Granite-3.3-2B-Base has not undergone safety alignment and may produce problematic outputs. Users should be aware of potential biases, misinformation, and the ethical implications of LLM use.