Name: mgoin/Mistral-Nemo-Instruct-2407-FP8-Dynamic API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: mgoin

Overview

This model, mgoin/Mistral-Nemo-Instruct-2407-FP8-Dynamic, is an FP8 quantized version of the Mistral-Nemo-Instruct-2407 LLM, developed by Mistral AI and NVIDIA. It is specifically compressed with dynamic activations for efficient use within vLLM, offering a balance of performance and resource utilization. The base model is an instruction-tuned variant of Mistral-Nemo-Base-2407, designed to significantly outperform other models of similar or smaller scale.

Key Capabilities & Features

Quantized for Efficiency: Compressed to FP8 weights with dynamic activations, ideal for high-throughput inference in vLLM.
Robust Architecture: Features 40 layers, 5,120 dimensions, and a 128k vocabulary size, utilizing a SwiGLU activation function and Grouped-Query Attention (GQA) with 8 KV-heads.
Extensive Context Window: Supports a 128k context window, enabling processing of long inputs and complex tasks.
Multilingual & Code Proficiency: Trained on a substantial amount of multilingual and code data, enhancing its versatility.
Strong Benchmark Performance: Achieves notable scores on benchmarks such as MMLU (68.0%), HellaSwag (83.5%), and Winogrande (76.8%), alongside competitive multilingual MMLU scores (e.g., French 62.3%, German 62.7%).
Apache 2.0 License: Released under a permissive license, allowing broad usage and deployment.

Usage & Deployment

This model is designed for easy integration with vLLM, and the base model can also be used with mistral_inference and Hugging Face transformers. It supports chat and function calling capabilities, making it suitable for interactive AI applications. The developers recommend using a temperature of 0.3 for optimal generation quality.

Limitations

The Mistral Nemo Instruct model is presented as a demonstration of the base model's fine-tuning potential. It currently lacks built-in moderation mechanisms, and the developers are actively seeking community engagement to implement guardrails for safe deployment in environments requiring moderated outputs.