ssslakter/YandexGPT-5-Lite-8B-instruct

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kTool Calling:SupportedPublished:Apr 18, 2025License:yandexgpt-5-lite-8bArchitecture:Transformer Cold

YandexGPT-5-Lite-8B-instruct is an 8 billion parameter instruction-tuned causal language model developed by Yandex, based on the YandexGPT 5 Lite Pretrain architecture. It features an 8192 token context length and is aligned using SFT and RLHF. This model demonstrates strong performance in international benchmarks, particularly excelling in Russian culture and factual knowledge compared to similar models like Llama-3.1-8B-instruct and Qwen-2.5-7B-instruct.

Loading preview...

YandexGPT-5-Lite-8B-instruct Overview

YandexGPT-5-Lite-8B-instruct is an 8 billion parameter instruction-tuned large language model developed by Yandex. It is built upon the YandexGPT 5 Lite Pretrain base model, without incorporating weights from third-party models. The alignment process for this Lite version, detailed in a Habr article, involves Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), mirroring the approach used for YandexGPT 5 Pro.

Key Capabilities & Performance

  • Strong Benchmark Performance: YandexGPT 5 Lite closely matches and, in some scenarios, surpasses analogues like Llama-3.1-8B-instruct and Qwen-2.5-7B-instruct in international benchmarks and their Russian adaptations.
  • Cultural and Factual Knowledge: A notable strength is its superior performance in tasks requiring knowledge of Russian culture and facts.
  • Context Length: The model supports an 8192 token context length.
  • Quantized Version Available: A quantized GGUF version is provided in a separate repository for use with tools like llama.cpp and ollama.

Unique Features

  • Custom Tokenization: The model utilizes a specific tokenization approach, recommending the original sentencepiece for full compatibility. It tokenizes each dialogue replica separately, introducing a space at the beginning of each replica and replacing \n with [NL] tokens.
  • Non-Standard Dialogue Template: It employs a unique dialogue template where the model is trained to generate only one reply after the sequence Ассистент:[SEP], ending with the </s> token. This can lead to different results in interactive mode versus fixed dialogue generation.