ibm-granite/granite-3.3-8b-base
Granite-3.3-8B-Base is an 8.1 billion parameter decoder-only language model developed by IBM, featuring a 128K token context window. It enhances its predecessor by incorporating Fill-in-the-Middle (FIM) support, making it particularly effective for code completion tasks. This model is also capable of general text-to-text generation, summarization, classification, and question-answering across 12 supported languages.
Loading preview...
Model Overview
Granite-3.3-8B-Base is an 8.1 billion parameter decoder-only language model from IBM, designed with a substantial 128K token context window. It builds upon the Granite-3.1-8B-Base by introducing Fill-in-the-Middle (FIM) capabilities, utilizing specialized tokens to generate content conditioned on both prefix and suffix. This feature significantly enhances its utility for tasks like code completion.
Key Capabilities
- Fill-in-the-Middle (FIM) Support: Enables generation based on surrounding text, ideal for code completion and similar tasks.
- Long Context Window: Supports up to 128K tokens, facilitating complex and extensive text processing.
- Multilingual Support: Trained to support English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese, with potential for fine-tuning in other languages.
- General Text-to-Text Generation: Proficient in summarization, text classification, extraction, and question-answering.
- Robust Architecture: Based on a dense transformer architecture incorporating GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings.
Intended Use Cases
- Code Completion: The primary differentiator, leveraging FIM for efficient code generation.
- Text Summarization: Condensing long documents or conversations.
- Text Classification: Categorizing text into predefined labels.
- Information Extraction: Pulling specific data points from unstructured text.
- Question Answering: Providing answers to queries based on given context.
- Baseline for Specialized Models: Can serve as a foundational model for further fine-tuning to specific application scenarios.