SillyTilly/mistralai_Mistral-Nemo-Instruct-2407
The Mistral-Nemo-Instruct-2407 is an instruct fine-tuned large language model developed jointly by Mistral AI and NVIDIA, based on the Mistral-Nemo-Base-2407 architecture. It features a 128k context window and is trained on a large proportion of multilingual and code data. This model is designed as a drop-in replacement for Mistral 7B, offering strong performance across various benchmarks including MMLU (68.0%) and HellaSwag (83.5%). Its primary strength lies in its instruction-following capabilities and multilingual support.
Loading preview...
Mistral-Nemo-Instruct-2407: An Overview
The Mistral-Nemo-Instruct-2407 is an instruct fine-tuned Large Language Model (LLM) developed collaboratively by Mistral AI and NVIDIA. It is based on the Mistral-Nemo-Base-2407 and is released under the Apache 2 License. This model is notable for its robust performance, often outperforming other models of similar or smaller scale.
Key Capabilities & Features
- Extensive Context Window: Trained with a substantial 128k context window, allowing for processing longer inputs and maintaining coherence over extended conversations.
- Multilingual & Code Proficiency: Benefits from training on a significant proportion of multilingual and code data, enhancing its versatility across different languages and programming tasks.
- Strong Benchmark Performance: Achieves competitive scores on various benchmarks, including 68.0% on MMLU (5-shot), 83.5% on HellaSwag (0-shot), and 76.8% on Winogrande (0-shot). It also demonstrates solid multilingual MMLU scores across French, German, Spanish, and other languages.
- Architectural Details: Features a transformer architecture with 40 layers, 5,120 dimensions, and a vocabulary size of approximately 128k, utilizing Grouped Query Attention (GQA) with 8 KV-heads.
- Instruction Following & Function Calling: Designed for effective instruction following and supports function calling, making it suitable for interactive applications and tool integration.
When to Use This Model
- Instruction-tuned applications: Ideal for tasks requiring precise instruction following.
- Multilingual use cases: Strong performance across multiple languages makes it suitable for global applications.
- Code-related tasks: Its training on code data suggests good capabilities for code generation or understanding.
- As a Mistral 7B replacement: Positioned as a direct, enhanced replacement for Mistral 7B, offering improved performance and features.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.