Name: Vikhrmodels/Vikhr-Nemo-12B-Instruct-R-21-09-24 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Vikhrmodels

Vikhr-Nemo-12B-Instruct-R-21-09-24: Enhanced Bilingual LLM

Vikhr-Nemo-12B-Instruct-R-21-09-24 is a 12 billion parameter large language model developed by VikhrModels, building upon the Mistral-Nemo-Instruct-2407 architecture. It is specifically adapted and optimized for high-quality generation in both Russian and English, with support for other languages. The model boasts a substantial context length of up to 128k tokens, inherited from its base model.

Key Capabilities & Features

Bilingual Proficiency: High-quality generation in Russian and English, supported by the custom Grandmaster-PRO-MAX dataset.
Optimized for Diverse Tasks: Excels in reasoning, summarization, code generation, roleplay, and dialogue.
Advanced RAG Mode: Features a unique "Grounded RAG" mode, inspired by Command-R, allowing the model to identify relevant document identifiers and use them for grounded answers. This mode requires a specific GROUNDED_SYSTEM_PROMPT and supports Markdown, HTML, or Plain Text document content.
System Prompt Support: Allows for regulating response style, ideally using English system prompts.
Training Methodology: Developed using a multi-stage process including SFT with a 150k instruction synthetic dataset and alignment via SMPO, a custom DPO variation, for improved answer quality.

Performance Highlights

On the ru-arena-general benchmark, Vikhr-Nemo-12B-Instruct-R-21-09-24 achieved a winrate of 79.8% against gpt-3.5-turbo-0125 (which has a 50% winrate as a reference). In RAG benchmarks, it demonstrated strong performance, with 68% judge-correct-percent for in-domain questions and 92% for out-of-domain questions, outperforming gpt-4o-mini and gpt-3.5-turbo-0125 in some metrics.

Limitations & Recommendations

Safety: The model has a low level of safety by default, prioritizing instruction following. Users should implement their own safety measures.
System Prompts: Best used for style specification (e.g., "answer only in json format") and preferably in English.
Temperature: Recommended to use with low temperature (0.1-0.5) and top_k (30-50) to avoid generation defects.