T-lite-it-1.0: A Qwen 2.5-based Model for Russian Language Tasks
T-lite-it-1.0 is a 7.6 billion parameter model developed by t-tech, leveraging the Qwen 2.5 architecture. It has undergone extensive continual pre-training and alignment, with a significant focus on Russian language data. The model's training involved 100 billion tokens of diverse Russian data (Common Crawl, books, code, proprietary datasets) mixed with re-played English data in the first pre-training stage, followed by 40 billion tokens of mixed instruction and pre-training data. Supervised fine-tuning utilized 1 billion tokens of diverse instruction data, and preference tuning used another 1 billion tokens to enhance helpfulness.
Key Capabilities and Performance
T-lite-it-1.0 demonstrates strong performance on Russian-specific benchmarks, often outperforming its base model, Qwen-2.5-7B-Instruct, and other comparable models like GigaChat Pro and RuAdapt-Qwen-7B-Instruct-v1. Key benchmark results include:
- MERA: 0.552 (highest among compared models)
- MaMuRaMu: 0.775 (highest)
- ruMMLU-PRO: 0.497 (highest)
- ruGSM8K: 0.856 (highest)
- ruMATH: 0.679 (highest)
- ruMBPP: 0.693 (highest)
- Arena-Hard-Ru: 64.38 (highest)
- Alpaca Eval Ru: 39.61 (highest)
Intended Use and Differentiators
This model is primarily designed for further fine-tuning rather than as a ready-to-use conversational assistant. Its significant pre-training on Russian datasets makes it particularly well-suited as a foundational model for applications requiring high proficiency in the Russian language. Developers can leverage its strong benchmark performance in Russian NLP tasks to build specialized solutions. It differentiates itself through its targeted optimization for Russian, achieving superior results on several key Russian benchmarks compared to other instruction-tuned models in its class.