Mistral-Nemo-Instruct-2407 Overview

The Mistral-Nemo-Instruct-2407 is a 12 billion parameter instruction-tuned large language model, a collaborative effort between Mistral AI and NVIDIA. It is based on the Mistral-Nemo-Base-2407 and is designed to offer strong performance, particularly outperforming other models of comparable or smaller scale. The model is released under the Apache 2 License, ensuring broad usability.

Key Capabilities & Features

Extensive Context Window: Features a substantial 128k context window, allowing for processing and understanding of long inputs.
Multilingual and Code Training: Trained on a diverse dataset including a large proportion of multilingual and code data, enhancing its versatility across different domains and languages.
Architectural Efficiency: Utilizes a transformer architecture with 40 layers, 32 heads, and 8 KV-heads (GQA), alongside a 128k vocabulary size and Rotary embeddings (theta = 1M).
Strong Benchmark Performance: Achieves notable scores on various benchmarks, including 68.0% on MMLU (5-shot), 83.5% on HellaSwag (0-shot), and 73.8% on TriviaQA (5-shot). It also demonstrates solid multilingual MMLU scores across French, German, Spanish, Italian, Portuguese, Russian, Chinese, and Japanese.
Framework Compatibility: Supports usage with mistral_inference, transformers, and NeMo frameworks, providing flexible deployment options.

Use Cases

This model is well-suited for general instruction-following tasks, multilingual applications, and code-related generation or understanding. Its performance and context length make it a strong candidate for applications requiring robust language capabilities and efficient processing.

Overview

Mistral-Nemo-Instruct-2407 Overview

Key Capabilities & Features

Use Cases

Full Model Card (README)