grimjim/mistralai-Mistral-Nemo-Instruct-2407

TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Jul 22, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The Mistral-Nemo-Instruct-2407 is a 12 billion parameter instruction-tuned large language model developed jointly by Mistral AI and NVIDIA. It features a 128k context window and is trained on a large proportion of multilingual and code data, making it a versatile model for various applications. This model significantly outperforms existing models of similar or smaller size, offering a drop-in replacement for Mistral 7B. It is released under the Apache 2 License and excels in general language understanding and generation tasks.

Loading preview...

Mistral-Nemo-Instruct-2407 Overview

The Mistral-Nemo-Instruct-2407 is a 12 billion parameter instruction-tuned large language model, a collaborative effort between Mistral AI and NVIDIA. It is based on the Mistral-Nemo-Base-2407 and is designed to offer strong performance, particularly outperforming other models of comparable or smaller scale. The model is released under the Apache 2 License, ensuring broad usability.

Key Capabilities & Features

  • Extensive Context Window: Features a substantial 128k context window, allowing for processing and understanding of long inputs.
  • Multilingual and Code Training: Trained on a diverse dataset including a large proportion of multilingual and code data, enhancing its versatility across different domains and languages.
  • Architectural Efficiency: Utilizes a transformer architecture with 40 layers, 32 heads, and 8 KV-heads (GQA), alongside a 128k vocabulary size and Rotary embeddings (theta = 1M).
  • Strong Benchmark Performance: Achieves notable scores on various benchmarks, including 68.0% on MMLU (5-shot), 83.5% on HellaSwag (0-shot), and 73.8% on TriviaQA (5-shot). It also demonstrates solid multilingual MMLU scores across French, German, Spanish, Italian, Portuguese, Russian, Chinese, and Japanese.
  • Framework Compatibility: Supports usage with mistral_inference, transformers, and NeMo frameworks, providing flexible deployment options.

Use Cases

This model is well-suited for general instruction-following tasks, multilingual applications, and code-related generation or understanding. Its performance and context length make it a strong candidate for applications requiring robust language capabilities and efficient processing.