Overview
T-pro-it-1.0 is a 32.8 billion parameter language model developed by t-tech, based on the Qwen 2.5 architecture. It has undergone extensive continual pre-training and alignment, with a significant focus on the Russian language. The model's training involved 100 billion tokens of diverse Russian data (Common Crawl, books, code, proprietary datasets) mixed with re-played English data in Stage 1, followed by 40 billion tokens of mixed instruction and pre-training data in Stage 2. Further refinement included 1 billion tokens for Supervised Fine-Tuning (SFT) and another 1 billion tokens for Preference Tuning to enhance helpfulness.
Key Capabilities & Performance
T-pro-it-1.0 demonstrates strong performance across a suite of Russian benchmarks, often outperforming comparable open-source models and showing competitive results against proprietary models like GPT-4o-mini and GigaChat Max 1.0.26.20. Notable strengths include:
- Mathematical Reasoning: Achieves 0.941 on ruGSM8K and 0.776 on ruMATH, surpassing many proprietary and open-source alternatives.
- Code Generation: Scores 0.432 / 0.626 / 0.677 on ruCodeEval, indicating robust coding capabilities in Russian.
- General Russian Understanding: Leads open-source models on MERA (0.629), MaMuRaMu (0.841), ruMMLU-PRO (0.665), Arena-Hard-Ru (90.17), MT Bench Ru (8.7), and Alpaca Eval Ru (47.61).
Intended Use
This model is primarily designed for further fine-tuning and is not intended as a ready-to-use conversational assistant. Users are responsible for additional training and oversight to ensure ethical and safety standards for industrial or commercial deployment. Its strong foundation in Russian language processing and specialized performance in mathematical and coding benchmarks make it an excellent base for applications requiring high-quality Russian language understanding and generation, especially in technical domains.