Vikhrmodels/Vikhr-YandexGPT-5-Lite-8B-it

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Feb 28, 2025License:yandexgpt-5-lite-8b-pretrainArchitecture:Transformer0.0K Cold

Vikhrmodels/Vikhr-YandexGPT-5-Lite-8B-it is an 8 billion parameter instruction-tuned language model based on YandexGPT-5-Lite-8B-pretrain, developed by Vikhrmodels. It is specialized for Russian language tasks, trained on the GrandMaster-PRO-MAX and Grounded-RAG-RU-v2 datasets using Supervised Fine-Tuning (SFT), and supports bilingual RU/EN interactions. This model excels in instruction following and RAG-grounded responses, particularly for Russian content.

Loading preview...

Vikhr-YandexGPT-5-Lite-8B-it Overview

Vikhr-YandexGPT-5-Lite-8B-it is an 8 billion parameter instruction-tuned model developed by Vikhrmodels, built upon the YandexGPT-5-Lite-8B-pretrain architecture. It is primarily specialized for the Russian language but offers bilingual RU/EN support.

Key Capabilities & Training

  • Instruction Following: The model was fine-tuned using Supervised Fine-Tuning (SFT) on a large (150k instructions) synthetic Russian dataset, GrandMaster-PRO-MAX, which incorporates Chain-Of-Thought (CoT) reasoning.
  • RAG Grounding: It includes a dedicated synthetic dataset, Grounded-RAG-RU-v2 (50k dialogues), designed to enhance its ability to provide answers grounded in provided documents. This RAG mode requires a specific system prompt and allows for document content in Markdown, HTML, or Plain Text.
  • Bilingual Support: While specialized in Russian, the model supports both Russian and English.

Usage Considerations

  • Safety: The model has a low level of response safety, prioritizing instruction adherence. Users should implement safety measures and test thoroughly.
  • System Prompts: System prompts are best used for specifying response style (e.g., "answer only in json format") and are recommended to be in English, as this aligns with the training data.
  • RAG Mode: Requires a specific GROUNDED_SYSTEM_PROMPT and works best with low temperatures (0.1-0.5) and top_k (30-50) to avoid generation defects.