IntervitensInc/Mistral-Nemo-Base-2407-chatml

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:12BQuant:FP8Ctx Length:32kPublished:Jul 27, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

IntervitensInc/Mistral-Nemo-Base-2407-chatml is a 12 billion parameter generative text model, developed jointly by Mistral AI and NVIDIA, and fine-tuned with ChatML tokens. This transformer model features a 32768-token context window and is trained on a large proportion of multilingual and code data. It is designed as a drop-in replacement for Mistral 7B, offering strong performance across various benchmarks, including multilingual MMLU.

Loading preview...

Mistral-Nemo-Base-2407-chatml Overview

IntervitensInc/Mistral-Nemo-Base-2407-chatml is a 12 billion parameter large language model, a collaborative effort between Mistral AI and NVIDIA. This version is specifically enhanced with ChatML tokens for finetuning purposes. It is built on a transformer architecture with 40 layers, 32 heads, and a 128k vocabulary size, utilizing Grouped Query Attention (GQA) with 8 kv-heads.

Key Capabilities

  • Enhanced for Finetuning: Includes ChatML tokens, making it ready for instruction-tuning and conversational applications.
  • Strong Performance: Outperforms models of similar or smaller size, achieving 68.0% on MMLU (5-shot) and 83.5% on HellaSwag (0-shot).
  • Multilingual Proficiency: Demonstrates robust performance across multiple languages, with MMLU scores ranging from 59.0% (Chinese, Japanese) to 64.6% (Spanish).
  • Extended Context Window: Trained with a 128k context window, supporting longer interactions and complex tasks.
  • Code Data Training: Benefits from training on a significant amount of code data, enhancing its capabilities for programming-related tasks.
  • Apache 2.0 License: Released under a permissive license, allowing broad usage and integration.

Good For

  • Instruction Tuning: Ideal for developers looking to fine-tune a base model for specific conversational or instruction-following tasks using ChatML format.
  • Multilingual Applications: Suitable for applications requiring understanding and generation in various languages.
  • Code-Related Tasks: Its training on code data makes it a strong candidate for code generation, completion, and understanding.
  • Replacing Mistral 7B: Designed as a direct upgrade or alternative for existing Mistral 7B implementations, offering improved performance at a similar scale.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p