Overview
Mistral-Nemo-Instruct-2407: A Powerful Instruction-Tuned LLM
Mistral-Nemo-Instruct-2407 is a 12 billion parameter instruction-tuned large language model, a collaborative effort between Mistral AI and NVIDIA. It is built upon the Mistral-Nemo-Base-2407 and is designed to offer strong performance, often outperforming models of similar or smaller scale.
Key Capabilities & Features
- Architecture: Transformer model with 40 layers, 5,120 hidden dimensions, and Grouped-Query Attention (GQA) with 8 KV-heads.
- Context Window: Features a substantial 128k context window, enabling processing of longer inputs and maintaining conversational coherence.
- Multilingual & Code Data: Trained on a significant proportion of multilingual and code data, enhancing its versatility across different languages and programming tasks.
- Instruction Following: Specifically fine-tuned for instruction following, making it highly effective for chat and command-based interactions.
- Benchmarks: Achieves competitive scores on various benchmarks, including 68.0% on MMLU (5-shot), 83.5% on HellaSwag (0-shot), and strong multilingual MMLU scores (e.g., 62.3% French, 62.7% German).
- Tool Use/Function Calling: Supports advanced function calling capabilities, allowing integration with external tools and APIs.
- Licensing: Released under the permissive Apache 2 License.
Good For
- General-purpose conversational AI: Its instruction-tuned nature makes it suitable for chatbots and interactive applications.
- Multilingual applications: Strong performance on multilingual benchmarks indicates its utility for global use cases.
- Code-related tasks: Training on code data suggests proficiency in code generation, understanding, and related functions.
- Developers seeking a powerful, open-source alternative: Positioned as a drop-in replacement for models like Mistral 7B, offering enhanced capabilities.