mistralai/Mistral-Nemo-Instruct-2407

Warm
Public
12B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Mistral-Nemo-Instruct-2407: A Powerful Instruction-Tuned LLM

Mistral-Nemo-Instruct-2407 is a 12 billion parameter instruction-tuned large language model, a collaborative effort between Mistral AI and NVIDIA. It is built upon the Mistral-Nemo-Base-2407 and is designed to offer strong performance, often outperforming models of similar or smaller scale.

Key Capabilities & Features

  • Architecture: Transformer model with 40 layers, 5,120 hidden dimensions, and Grouped-Query Attention (GQA) with 8 KV-heads.
  • Context Window: Features a substantial 128k context window, enabling processing of longer inputs and maintaining conversational coherence.
  • Multilingual & Code Data: Trained on a significant proportion of multilingual and code data, enhancing its versatility across different languages and programming tasks.
  • Instruction Following: Specifically fine-tuned for instruction following, making it highly effective for chat and command-based interactions.
  • Benchmarks: Achieves competitive scores on various benchmarks, including 68.0% on MMLU (5-shot), 83.5% on HellaSwag (0-shot), and strong multilingual MMLU scores (e.g., 62.3% French, 62.7% German).
  • Tool Use/Function Calling: Supports advanced function calling capabilities, allowing integration with external tools and APIs.
  • Licensing: Released under the permissive Apache 2 License.

Good For

  • General-purpose conversational AI: Its instruction-tuned nature makes it suitable for chatbots and interactive applications.
  • Multilingual applications: Strong performance on multilingual benchmarks indicates its utility for global use cases.
  • Code-related tasks: Training on code data suggests proficiency in code generation, understanding, and related functions.
  • Developers seeking a powerful, open-source alternative: Positioned as a drop-in replacement for models like Mistral 7B, offering enhanced capabilities.