yandex/YandexGPT-5-Lite-8B-pretrain

Warm
Public
8B
FP8
8192
Feb 21, 2025
License: other
Hugging Face
Overview

YandexGPT-5-Lite-8B-pretrain Overview

YandexGPT-5-Lite-8B-pretrain is an 8 billion parameter pre-trained large language model from Yandex, designed with a 32k token context window. Its training involved a two-stage process, initially on 15T tokens with a context of up to 8k, comprising 60% web pages, 15% code, and 10% mathematics, alongside other specialized data including synthetic content and Yandex service datasets. The second stage, dubbed 'Powerup', extended training on 320B high-quality tokens, increasing the context length to 32k and diversifying the dataset to include 25% web pages, 19% mathematics, 18% code, and 18% educational data.

Key Capabilities & Differentiators

  • Optimized for Russian Language: The model's tokenizer is highly efficient for Russian, where 32k tokens in this model are equivalent to approximately 48k tokens in models like Qwen-2.5.
  • Strong Benchmark Performance: It achieves parity with or surpasses global SOTA pre-trained models across various key benchmarks, as detailed in the developer's report.
  • Llama-like Architecture: Its architecture ensures compatibility with most existing LLM fine-tuning frameworks, such as torchtune, facilitating adaptation for specific tasks.

Good For

  • Research and Development: Ideal for researchers exploring large language models, especially those focusing on multilingual capabilities with a strong emphasis on Russian.
  • Custom Fine-tuning: Developers looking to fine-tune a powerful 8B parameter model for specific applications, leveraging its Llama-like architecture and extensive pre-training.
  • Applications Requiring Long Context: Suitable for tasks that benefit from a 32k token context window, particularly in Russian and English language processing.