Model Overview
Vikhr-Llama3.1-8B-Instruct-R-21-09-24 is an 8 billion parameter instruction-tuned large language model developed by VikhrModels. It is an enhanced version of meta-llama/Meta-Llama-3.1-8B-Instruct, primarily adapted for Russian and English languages through a multi-stage training process involving SFT and SMPO (a proprietary DPO variation).
Key Capabilities & Features
- Multilingual Generation: Optimized for high-quality outputs in Russian and English, with support for other languages, leveraging the
Grandmaster-PRO-MAXdataset. - Extended Context: Supports up to 128k tokens context length due to the base model's RoPE scaling.
- Advanced RAG Mode: Features a unique "Grounded RAG" mode with a dedicated
documentsrole, enabling the model to identify and utilize relevant document identifiers for answering user questions, inspired by Command-R. - System Prompt Support: Allows for regulating response style using system prompts.
- Diverse Use Cases: Optimized for reasoning, summarization, code generation, roleplay, and dialogue maintenance.
Performance & Benchmarks
The model was evaluated on VikhrModels' open-source Russian-language SbS benchmark, ru-arena-general, where it achieved a winrate of 63.4% against gpt-3.5-turbo-0125 (which has a 50% winrate as a reference). In RAG benchmarks, it demonstrated strong performance, with 64% judge-correct percentage for in-domain questions and 89% for out-of-domain questions, outperforming gpt-3.5-turbo-0125 in both categories.
Training Methodology
Training involved a large synthetic instructional dataset (Vikhrmodels/GrandMaster-PRO-MAX) with built-in CoT, and a RAG grounding dataset (Vikhrmodels/Grounded-RAG-RU-v2). Alignment was achieved using SMPO, a custom preference optimization method, after training a custom Reward Model and performing Rejection Sampling.
Usage Recommendations
- RAG Mode: Requires a specific
GROUNDED_SYSTEM_PROMPTand structureddocumentsinput (JSON array of dictionaries). - Safety: Has a low safety level; users should test independently. System prompts can partially mitigate this.
- System Prompts: Best used for specifying response style (e.g., "answer only in json format") and preferably written in English.
- Generation Settings: Recommended to use with low temperature (0.1-0.4), beam search, and
top_k(30-50).