Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24

Warm
Public
12B
FP8
32768
License: apache-2.0
Hugging Face
Overview

Vikhr-Nemo-12B-Instruct-R-21-09-24: Enhanced Bilingual LLM

Vikhr-Nemo-12B-Instruct-R-21-09-24 is a 12 billion parameter large language model developed by VikhrModels, building upon the Mistral-Nemo-Instruct-2407 architecture. It is specifically adapted and optimized for high-quality generation in both Russian and English, with support for other languages. The model boasts a substantial context length of up to 128k tokens, inherited from its base model.

Key Capabilities & Features

  • Bilingual Proficiency: High-quality generation in Russian and English, supported by the custom Grandmaster-PRO-MAX dataset.
  • Optimized for Diverse Tasks: Excels in reasoning, summarization, code generation, roleplay, and dialogue.
  • Advanced RAG Mode: Features a unique "Grounded RAG" mode, inspired by Command-R, allowing the model to identify relevant document identifiers and use them for grounded answers. This mode requires a specific GROUNDED_SYSTEM_PROMPT and supports Markdown, HTML, or Plain Text document content.
  • System Prompt Support: Allows for regulating response style, ideally using English system prompts.
  • Training Methodology: Developed using a multi-stage process including SFT with a 150k instruction synthetic dataset and alignment via SMPO, a custom DPO variation, for improved answer quality.

Performance Highlights

On the ru-arena-general benchmark, Vikhr-Nemo-12B-Instruct-R-21-09-24 achieved a winrate of 79.8% against gpt-3.5-turbo-0125 (which has a 50% winrate as a reference). In RAG benchmarks, it demonstrated strong performance, with 68% judge-correct-percent for in-domain questions and 92% for out-of-domain questions, outperforming gpt-4o-mini and gpt-3.5-turbo-0125 in some metrics.

Limitations & Recommendations

  • Safety: The model has a low level of safety by default, prioritizing instruction following. Users should implement their own safety measures.
  • System Prompts: Best used for style specification (e.g., "answer only in json format") and preferably in English.
  • Temperature: Recommended to use with low temperature (0.1-0.5) and top_k (30-50) to avoid generation defects.