SillyTilly/mistralai_Mistral-Nemo-Instruct-2407

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Jul 18, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The Mistral-Nemo-Instruct-2407 is an instruct fine-tuned large language model developed jointly by Mistral AI and NVIDIA, based on the Mistral-Nemo-Base-2407 architecture. It features a 128k context window and is trained on a large proportion of multilingual and code data. This model is designed as a drop-in replacement for Mistral 7B, offering strong performance across various benchmarks including MMLU (68.0%) and HellaSwag (83.5%). Its primary strength lies in its instruction-following capabilities and multilingual support.

Loading preview...

Mistral-Nemo-Instruct-2407: An Overview

The Mistral-Nemo-Instruct-2407 is an instruct fine-tuned Large Language Model (LLM) developed collaboratively by Mistral AI and NVIDIA. It is based on the Mistral-Nemo-Base-2407 and is released under the Apache 2 License. This model is notable for its robust performance, often outperforming other models of similar or smaller scale.

Key Capabilities & Features

  • Extensive Context Window: Trained with a substantial 128k context window, allowing for processing longer inputs and maintaining coherence over extended conversations.
  • Multilingual & Code Proficiency: Benefits from training on a significant proportion of multilingual and code data, enhancing its versatility across different languages and programming tasks.
  • Strong Benchmark Performance: Achieves competitive scores on various benchmarks, including 68.0% on MMLU (5-shot), 83.5% on HellaSwag (0-shot), and 76.8% on Winogrande (0-shot). It also demonstrates solid multilingual MMLU scores across French, German, Spanish, and other languages.
  • Architectural Details: Features a transformer architecture with 40 layers, 5,120 dimensions, and a vocabulary size of approximately 128k, utilizing Grouped Query Attention (GQA) with 8 KV-heads.
  • Instruction Following & Function Calling: Designed for effective instruction following and supports function calling, making it suitable for interactive applications and tool integration.

When to Use This Model

  • Instruction-tuned applications: Ideal for tasks requiring precise instruction following.
  • Multilingual use cases: Strong performance across multiple languages makes it suitable for global applications.
  • Code-related tasks: Its training on code data suggests good capabilities for code generation or understanding.
  • As a Mistral 7B replacement: Positioned as a direct, enhanced replacement for Mistral 7B, offering improved performance and features.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p