grimjim/mistralai-Mistral-Nemo-Base-2407

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Aug 3, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The Mistral-Nemo-Base-2407 is a 12 billion parameter generative text model developed jointly by Mistral AI and NVIDIA. It features a 128k context window and was trained on a substantial proportion of multilingual and code data. This model is designed as a drop-in replacement for Mistral 7B, offering enhanced performance for various text generation tasks, particularly excelling in multilingual understanding and code-related applications.

Loading preview...

Mistral-Nemo-Base-2407: A Powerful 12B Multilingual and Code-Optimized Model

The Mistral-Nemo-Base-2407 is a 12 billion parameter large language model, jointly developed by Mistral AI and NVIDIA. Released under the Apache 2 License, this model is a significant advancement, outperforming many existing models of similar or smaller scale. It is designed as a direct replacement for Mistral 7B, offering enhanced capabilities.

Key Capabilities

  • Extended Context Window: Trained with an impressive 128k context window, allowing for processing and generating longer, more coherent texts.
  • Multilingual and Code Proficiency: Benefits from training on a large proportion of multilingual and code data, making it highly effective for diverse language tasks and code generation.
  • Strong General Benchmarks: Achieves competitive scores across various benchmarks, including 83.5% on HellaSwag (0-shot), 68.0% on MMLU (5-shot), and 73.8% on TriviaQA (5-shot).
  • Multilingual MMLU Performance: Demonstrates solid performance in multiple languages, with scores around 60-64% for French, German, Spanish, Italian, and Portuguese, and around 59% for Russian, Chinese, and Japanese.
  • Efficient Architecture: Utilizes a transformer architecture with 40 layers, 8 Grouped-Query Attention (GQA) heads, and a large vocabulary size of approximately 128k, optimized for performance.

Good For

  • Multilingual Applications: Ideal for tasks requiring understanding and generation in multiple languages due to its extensive multilingual training.
  • Code Generation and Understanding: Well-suited for developers and applications involving code, given its significant training on code data.
  • Long-Context Tasks: Excellent for scenarios demanding a deep understanding of lengthy inputs, thanks to its 128k context window.
  • General Text Generation: A robust base model for a wide array of generative text tasks, serving as an upgraded alternative to Mistral 7B.