Mistral-Nemo-Instruct-2407 Overview
The Mistral-Nemo-Instruct-2407 is a 12 billion parameter instruction-tuned large language model, a collaborative effort between Mistral AI and NVIDIA. It is based on the Mistral-Nemo-Base-2407 and is designed to offer strong performance, particularly outperforming other models of comparable or smaller scale. The model is released under the Apache 2 License, ensuring broad usability.
Key Capabilities & Features
- Extensive Context Window: Features a substantial 128k context window, allowing for processing and understanding of long inputs.
- Multilingual and Code Training: Trained on a diverse dataset including a large proportion of multilingual and code data, enhancing its versatility across different domains and languages.
- Architectural Efficiency: Utilizes a transformer architecture with 40 layers, 32 heads, and 8 KV-heads (GQA), alongside a 128k vocabulary size and Rotary embeddings (theta = 1M).
- Strong Benchmark Performance: Achieves notable scores on various benchmarks, including 68.0% on MMLU (5-shot), 83.5% on HellaSwag (0-shot), and 73.8% on TriviaQA (5-shot). It also demonstrates solid multilingual MMLU scores across French, German, Spanish, Italian, Portuguese, Russian, Chinese, and Japanese.
- Framework Compatibility: Supports usage with
mistral_inference, transformers, and NeMo frameworks, providing flexible deployment options.
Use Cases
This model is well-suited for general instruction-following tasks, multilingual applications, and code-related generation or understanding. Its performance and context length make it a strong candidate for applications requiring robust language capabilities and efficient processing.