Name: Vikhrmodels/Vikhr-Llama3.1-8B-Instruct-R-21-09-24 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Vikhrmodels

Model Overview

Vikhr-Llama3.1-8B-Instruct-R-21-09-24 is an 8 billion parameter instruction-tuned large language model developed by VikhrModels. It is an enhanced version of meta-llama/Meta-Llama-3.1-8B-Instruct, primarily adapted for Russian and English languages through a multi-stage training process involving SFT and SMPO (a proprietary DPO variation).

Key Capabilities & Features

Multilingual Generation: Optimized for high-quality outputs in Russian and English, with support for other languages, leveraging the Grandmaster-PRO-MAX dataset.
Extended Context: Supports up to 128k tokens context length due to the base model's RoPE scaling.
Advanced RAG Mode: Features a unique "Grounded RAG" mode with a dedicated documents role, enabling the model to identify and utilize relevant document identifiers for answering user questions, inspired by Command-R.
System Prompt Support: Allows for regulating response style using system prompts.
Diverse Use Cases: Optimized for reasoning, summarization, code generation, roleplay, and dialogue maintenance.

Performance & Benchmarks

The model was evaluated on VikhrModels' open-source Russian-language SbS benchmark, ru-arena-general, where it achieved a winrate of 63.4% against gpt-3.5-turbo-0125 (which has a 50% winrate as a reference). In RAG benchmarks, it demonstrated strong performance, with 64% judge-correct percentage for in-domain questions and 89% for out-of-domain questions, outperforming gpt-3.5-turbo-0125 in both categories.

Training Methodology

Training involved a large synthetic instructional dataset (Vikhrmodels/GrandMaster-PRO-MAX) with built-in CoT, and a RAG grounding dataset (Vikhrmodels/Grounded-RAG-RU-v2). Alignment was achieved using SMPO, a custom preference optimization method, after training a custom Reward Model and performing Rejection Sampling.

Usage Recommendations

RAG Mode: Requires a specific GROUNDED_SYSTEM_PROMPT and structured documents input (JSON array of dictionaries).
Safety: Has a low safety level; users should test independently. System prompts can partially mitigate this.
System Prompts: Best used for specifying response style (e.g., "answer only in json format") and preferably written in English.
Generation Settings: Recommended to use with low temperature (0.1-0.4), beam search, and top_k (30-50).