Name: IntervitensInc/Mistral-Nemo-Base-2407-chatml API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: IntervitensInc

Mistral-Nemo-Base-2407-chatml Overview

IntervitensInc/Mistral-Nemo-Base-2407-chatml is a 12 billion parameter large language model, a collaborative effort between Mistral AI and NVIDIA. This version is specifically enhanced with ChatML tokens for finetuning purposes. It is built on a transformer architecture with 40 layers, 32 heads, and a 128k vocabulary size, utilizing Grouped Query Attention (GQA) with 8 kv-heads.

Key Capabilities

Enhanced for Finetuning: Includes ChatML tokens, making it ready for instruction-tuning and conversational applications.
Strong Performance: Outperforms models of similar or smaller size, achieving 68.0% on MMLU (5-shot) and 83.5% on HellaSwag (0-shot).
Multilingual Proficiency: Demonstrates robust performance across multiple languages, with MMLU scores ranging from 59.0% (Chinese, Japanese) to 64.6% (Spanish).
Extended Context Window: Trained with a 128k context window, supporting longer interactions and complex tasks.
Code Data Training: Benefits from training on a significant amount of code data, enhancing its capabilities for programming-related tasks.
Apache 2.0 License: Released under a permissive license, allowing broad usage and integration.

Good For

Instruction Tuning: Ideal for developers looking to fine-tune a base model for specific conversational or instruction-following tasks using ChatML format.
Multilingual Applications: Suitable for applications requiring understanding and generation in various languages.
Code-Related Tasks: Its training on code data makes it a strong candidate for code generation, completion, and understanding.
Replacing Mistral 7B: Designed as a direct upgrade or alternative for existing Mistral 7B implementations, offering improved performance at a similar scale.

Overview

Mistral-Nemo-Base-2407-chatml Overview

Key Capabilities

Good For

Full Model Card (README)