ibm-granite/granite-3.0-8b-instruct
Granite-3.0-8B-Instruct is an 8.1 billion parameter decoder-only dense transformer model developed by the IBM Granite Team. It is fine-tuned using supervised fine-tuning, reinforcement learning, and model merging, and supports a 32K context length. This model is designed for general instruction following and excels in tasks such as summarization, text classification, question-answering, RAG, and multilingual dialog across 12 languages.
Loading preview...
Granite-3.0-8B-Instruct Overview
Granite-3.0-8B-Instruct is an 8.1 billion parameter instruction-tuned model developed by the IBM Granite Team. It is built upon the Granite-3.0-8B-Base model and incorporates advanced techniques including supervised fine-tuning, reinforcement learning for alignment, and model merging. The architecture features a decoder-only dense transformer with GQA, RoPE, SwiGLU MLP, RMSNorm, and shared input/output embeddings, supporting a 32,768 token sequence length.
Key Capabilities
- General Instruction Following: Designed to respond to a wide range of instructions.
- Multilingual Support: Capable of handling dialog in English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese.
- Diverse NLP Tasks: Proficient in summarization, text classification, text extraction, and question-answering.
- Advanced AI Applications: Supports Retrieval Augmented Generation (RAG), code-related tasks, and function-calling.
Training and Development
The model was trained on IBM's Blue Vela supercomputing cluster using NVIDIA H100 GPUs. Its training data comprises a combination of publicly available instruction datasets with permissive licenses, internally collected synthetic datasets, and a small amount of human-curated data. IBM emphasizes continuous improvement and recommends checking out their latest Granite 3.1 models for updates.