ibm-granite/granite-3.0-2b-instruct
Granite-3.0-2B-Instruct is a 2 billion parameter instruction-tuned decoder-only transformer model developed by the Granite Team at IBM. It is fine-tuned from Granite-3.0-2B-Base using supervised fine-tuning, reinforcement learning, and model merging techniques. This model supports a 32,768 token context length and is designed for general instruction following, excelling in tasks like summarization, text classification, question-answering, and multilingual dialog across 12 languages.
Loading preview...
Overview
Granite-3.0-2B-Instruct is a 2 billion parameter instruction-tuned language model developed by the Granite Team at IBM. It is built upon a decoder-only dense transformer architecture, incorporating features like GQA, RoPE, MLP with SwiGLU, RMSNorm, and shared input/output embeddings. The model was fine-tuned from its base version using a combination of open-source instruction datasets, internally collected synthetic data, and human-curated data, employing supervised fine-tuning, reinforcement learning for alignment, and model merging techniques.
Key Capabilities
- General Instruction Following: Designed to respond to a wide range of instructions.
- Multilingual Support: Supports English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese, with potential for fine-tuning in additional languages.
- Diverse NLP Tasks: Proficient in summarization, text classification, text extraction, question-answering, and Retrieval Augmented Generation (RAG).
- Code and Function-calling: Capable of handling code-related tasks and function-calling scenarios.
Training and Architecture
The model features a 2048 embedding size, 40 layers, 32 attention heads, and a 4096 sequence length. It was trained on 12 trillion tokens using IBM's Blue Vela supercomputing cluster, which utilizes 100% renewable energy sources. The model's alignment process considered safety, though users are advised to conduct their own safety testing for specific applications.
Intended Use Cases
This model is suitable for building AI assistants across various domains, including business applications, and for multilingual dialog use cases. While it handles multilingual tasks, performance may vary compared to English, and few-shot examples can enhance accuracy for non-English tasks.