Mistral-Nemo-Base-2407 Overview

Mistral-Nemo-Base-2407 is a 12 billion parameter large language model, a collaborative effort between Mistral AI and NVIDIA. This pretrained generative text model is released under the Apache 2 License and is noted for outperforming other models of similar or smaller scale.

Key Capabilities & Features

Architecture: Transformer model with 40 layers, 32 heads, and 8 KV-heads (GQA), utilizing SwiGLU activation and rotary embeddings (theta = 1M).
Context Window: Features an extensive 128k context window, enabling processing of longer inputs.
Multilingual & Code Training: Trained on a substantial dataset including multilingual and code data, enhancing its versatility.
Performance: Achieves competitive scores on various benchmarks, including 68.0% on MMLU (5-shot), 83.5% on HellaSwag (0-shot), and 73.8% on TriviaQA (5-shot). It also demonstrates strong multilingual MMLU performance across languages like French (62.3%), German (62.7%), and Spanish (64.6%).
Compatibility: Designed as a direct replacement for Mistral 7B, facilitating easy integration into existing workflows.

Usage & Integration

The model supports integration with multiple frameworks, including mistral_inference, transformers, and NeMo. For optimal performance with transformers, a temperature of 0.3 is recommended. As a base model, Mistral-Nemo-Base-2407 does not include built-in moderation mechanisms.