Name: grimjim/mistralai-Mistral-Nemo-Base-2407 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: grimjim

Mistral-Nemo-Base-2407: A Powerful 12B Multilingual and Code-Optimized Model

The Mistral-Nemo-Base-2407 is a 12 billion parameter large language model, jointly developed by Mistral AI and NVIDIA. Released under the Apache 2 License, this model is a significant advancement, outperforming many existing models of similar or smaller scale. It is designed as a direct replacement for Mistral 7B, offering enhanced capabilities.

Key Capabilities

Extended Context Window: Trained with an impressive 128k context window, allowing for processing and generating longer, more coherent texts.
Multilingual and Code Proficiency: Benefits from training on a large proportion of multilingual and code data, making it highly effective for diverse language tasks and code generation.
Strong General Benchmarks: Achieves competitive scores across various benchmarks, including 83.5% on HellaSwag (0-shot), 68.0% on MMLU (5-shot), and 73.8% on TriviaQA (5-shot).
Multilingual MMLU Performance: Demonstrates solid performance in multiple languages, with scores around 60-64% for French, German, Spanish, Italian, and Portuguese, and around 59% for Russian, Chinese, and Japanese.
Efficient Architecture: Utilizes a transformer architecture with 40 layers, 8 Grouped-Query Attention (GQA) heads, and a large vocabulary size of approximately 128k, optimized for performance.

Good For

Multilingual Applications: Ideal for tasks requiring understanding and generation in multiple languages due to its extensive multilingual training.
Code Generation and Understanding: Well-suited for developers and applications involving code, given its significant training on code data.
Long-Context Tasks: Excellent for scenarios demanding a deep understanding of lengthy inputs, thanks to its 128k context window.
General Text Generation: A robust base model for a wide array of generative text tasks, serving as an upgraded alternative to Mistral 7B.

Overview

Mistral-Nemo-Base-2407: A Powerful 12B Multilingual and Code-Optimized Model

Key Capabilities

Good For

Full Model Card (README)