Vikhrmodels/Vikhr-Llama3.1-8B-Instruct-R-21-09-24

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Sep 20, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Vikhr-Llama3.1-8B-Instruct-R-21-09-24 is an 8 billion parameter unimodal large language model developed by VikhrModels, based on Meta-Llama-3.1-8B-Instruct. It is specifically optimized for high-quality generation in Russian and English, featuring advanced RAG capabilities and support for up to 128k context tokens. The model excels in reasoning, summarization, code generation, and roleplay, aiming to surpass GPT-3.5-turbo in many tasks.

Loading preview...

Model Overview

Vikhr-Llama3.1-8B-Instruct-R-21-09-24 is an 8 billion parameter instruction-tuned large language model developed by VikhrModels. It is an enhanced version of meta-llama/Meta-Llama-3.1-8B-Instruct, primarily adapted for Russian and English languages through a multi-stage training process involving SFT and SMPO (a proprietary DPO variation).

Key Capabilities & Features

  • Multilingual Generation: Optimized for high-quality outputs in Russian and English, with support for other languages, leveraging the Grandmaster-PRO-MAX dataset.
  • Extended Context: Supports up to 128k tokens context length due to the base model's RoPE scaling.
  • Advanced RAG Mode: Features a unique "Grounded RAG" mode with a dedicated documents role, enabling the model to identify and utilize relevant document identifiers for answering user questions, inspired by Command-R.
  • System Prompt Support: Allows for regulating response style using system prompts.
  • Diverse Use Cases: Optimized for reasoning, summarization, code generation, roleplay, and dialogue maintenance.

Performance & Benchmarks

The model was evaluated on VikhrModels' open-source Russian-language SbS benchmark, ru-arena-general, where it achieved a winrate of 63.4% against gpt-3.5-turbo-0125 (which has a 50% winrate as a reference). In RAG benchmarks, it demonstrated strong performance, with 64% judge-correct percentage for in-domain questions and 89% for out-of-domain questions, outperforming gpt-3.5-turbo-0125 in both categories.

Training Methodology

Training involved a large synthetic instructional dataset (Vikhrmodels/GrandMaster-PRO-MAX) with built-in CoT, and a RAG grounding dataset (Vikhrmodels/Grounded-RAG-RU-v2). Alignment was achieved using SMPO, a custom preference optimization method, after training a custom Reward Model and performing Rejection Sampling.

Usage Recommendations

  • RAG Mode: Requires a specific GROUNDED_SYSTEM_PROMPT and structured documents input (JSON array of dictionaries).
  • Safety: Has a low safety level; users should test independently. System prompts can partially mitigate this.
  • System Prompts: Best used for specifying response style (e.g., "answer only in json format") and preferably written in English.
  • Generation Settings: Recommended to use with low temperature (0.1-0.4), beam search, and top_k (30-50).

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p