Defetya/gemma-2b-ru: A Russian-Optimized Gemma Base Model
This model, developed by Defetya, is a 2.6 billion parameter variant of Google's Gemma 2B, specifically enhanced for the Russian language. It underwent a significant second stage of pre-training, utilizing approximately 150 billion tokens from a combination of English and Russian subsets of the Oscar and Wiki datasets.
Key Capabilities & Characteristics
- Russian Language Focus: Designed to achieve high fluency in Russian, building upon the foundational capabilities of Gemma 2B.
- Pre-trained Foundation: This is a raw, pre-trained model, intended as a strong base for subsequent fine-tuning tasks.
- Cross-Linguistic Research: Aims to contribute to research on cross-linguistic capabilities in open-source large language models.
- Training Details: Pre-trained using a JAX-based framework (EasyLM's fork) on Google's v4-32 TPU, reaching a training loss of approximately 1.5.
Intended Use Cases
- Further Fine-tuning: Ideal for developers and researchers looking to fine-tune a robust base model for specific Russian-language applications.
- Russian NLP Development: Serves as a foundational component for creating advanced open-source LLMs fluent in Russian.
- Research: Suitable for exploring and advancing cross-linguistic understanding and generation in LLMs.