yandex/YandexGPT-5-Lite-8B-instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kPublished:Mar 28, 2025License:yandexgpt-5-lite-8bArchitecture:Transformer0.1K Warm

YandexGPT-5-Lite-8B-instruct is an 8 billion parameter instruction-tuned large language model developed by Yandex, based on the YandexGPT 5 Lite Pretrain architecture. It features a 32k token context length and was trained without third-party model weights, utilizing SFT and RLHF alignment stages. This model demonstrates strong performance in international benchmarks and excels in scenarios requiring knowledge of Russian culture and facts, often surpassing similar models like Llama-3.1-8B-instruct and Qwen-2.5-7B-instruct in specific areas.

Loading preview...

YandexGPT-5-Lite-8B-instruct Overview

YandexGPT-5-Lite-8B-instruct is an 8 billion parameter instruction-tuned large language model developed by Yandex. It is built upon the YandexGPT 5 Lite Pretrain base and features a substantial 32k token context length, making it suitable for processing longer inputs and generating detailed responses. The model's alignment process includes Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), similar to the YandexGPT 5 Pro.

Key Capabilities & Differentiators

  • Strong Performance: Benchmarks indicate that YandexGPT 5 Lite closely matches or exceeds competitors like Llama-3.1-8B-instruct and Qwen-2.5-7B-instruct in various scenarios.
  • Russian Cultural & Factual Knowledge: A notable strength is its superior performance in tasks requiring knowledge of Russian culture and specific facts.
  • Independent Training: The model was trained from scratch without incorporating weights from any third-party models.
  • Quantized Version Available: A GGUF quantized version is provided for efficient deployment with tools like llama.cpp and ollama.

Usage & Technical Notes

Developers can integrate YandexGPT-5-Lite-8B-instruct using popular libraries like Hugging Face Transformers and vLLM. The model utilizes a specific tokenization approach, recommending the use of the original sentencepiece tokenizer for full compatibility. It also employs a unique dialogue template where the model is trained to generate only one response after the Ассистент:[SEP] sequence, ending with the </s> token. This specific template means interactive mode results might differ from fixed dialogue generation.