Mistral-Nemo-Base-2407: A Powerful 12B Multilingual and Code-Optimized Model
The Mistral-Nemo-Base-2407 is a 12 billion parameter large language model, jointly developed by Mistral AI and NVIDIA. Released under the Apache 2 License, this model is a significant advancement, outperforming many existing models of similar or smaller scale. It is designed as a direct replacement for Mistral 7B, offering enhanced capabilities.
Key Capabilities
- Extended Context Window: Trained with an impressive 128k context window, allowing for processing and generating longer, more coherent texts.
- Multilingual and Code Proficiency: Benefits from training on a large proportion of multilingual and code data, making it highly effective for diverse language tasks and code generation.
- Strong General Benchmarks: Achieves competitive scores across various benchmarks, including 83.5% on HellaSwag (0-shot), 68.0% on MMLU (5-shot), and 73.8% on TriviaQA (5-shot).
- Multilingual MMLU Performance: Demonstrates solid performance in multiple languages, with scores around 60-64% for French, German, Spanish, Italian, and Portuguese, and around 59% for Russian, Chinese, and Japanese.
- Efficient Architecture: Utilizes a transformer architecture with 40 layers, 8 Grouped-Query Attention (GQA) heads, and a large vocabulary size of approximately 128k, optimized for performance.
Good For
- Multilingual Applications: Ideal for tasks requiring understanding and generation in multiple languages due to its extensive multilingual training.
- Code Generation and Understanding: Well-suited for developers and applications involving code, given its significant training on code data.
- Long-Context Tasks: Excellent for scenarios demanding a deep understanding of lengthy inputs, thanks to its 128k context window.
- General Text Generation: A robust base model for a wide array of generative text tasks, serving as an upgraded alternative to Mistral 7B.