Name: yandex/YandexGPT-5-Lite-8B-pretrain API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: yandex

YandexGPT-5-Lite-8B-pretrain Overview

YandexGPT-5-Lite-8B-pretrain is an 8 billion parameter pre-trained large language model from Yandex, designed with a 32k token context window. Its training involved a two-stage process, initially on 15T tokens with a context of up to 8k, comprising 60% web pages, 15% code, and 10% mathematics, alongside other specialized data including synthetic content and Yandex service datasets. The second stage, dubbed 'Powerup', extended training on 320B high-quality tokens, increasing the context length to 32k and diversifying the dataset to include 25% web pages, 19% mathematics, 18% code, and 18% educational data.

Key Capabilities & Differentiators

Optimized for Russian Language: The model's tokenizer is highly efficient for Russian, where 32k tokens in this model are equivalent to approximately 48k tokens in models like Qwen-2.5.
Strong Benchmark Performance: It achieves parity with or surpasses global SOTA pre-trained models across various key benchmarks, as detailed in the developer's report.
Llama-like Architecture: Its architecture ensures compatibility with most existing LLM fine-tuning frameworks, such as torchtune, facilitating adaptation for specific tasks.

Good For

Research and Development: Ideal for researchers exploring large language models, especially those focusing on multilingual capabilities with a strong emphasis on Russian.
Custom Fine-tuning: Developers looking to fine-tune a powerful 8B parameter model for specific applications, leveraging its Llama-like architecture and extensive pre-training.
Applications Requiring Long Context: Suitable for tasks that benefit from a 32k token context window, particularly in Russian and English language processing.

Overview

YandexGPT-5-Lite-8B-pretrain Overview

Key Capabilities & Differentiators

Good For

Full Model Card (README)