Defetya/gemma-2b-ru

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kLicense:apache-2.0Architecture:Transformer Open Weights Warm

Defetya/gemma-2b-ru is a 2.6 billion parameter Gemma-based language model, pre-trained by Defetya, specifically optimized for Russian language fluency. It underwent a second stage of pre-training on 150 billion tokens from English and Russian Oscar and Wiki datasets. This foundational model is designed for further fine-tuning, aiming to enhance cross-linguistic capabilities and serve as a strong open-source Russian LLM.

Loading preview...

Defetya/gemma-2b-ru: A Russian-Optimized Gemma Base Model

This model, developed by Defetya, is a 2.6 billion parameter variant of Google's Gemma 2B, specifically enhanced for the Russian language. It underwent a significant second stage of pre-training, utilizing approximately 150 billion tokens from a combination of English and Russian subsets of the Oscar and Wiki datasets.

Key Capabilities & Characteristics

  • Russian Language Focus: Designed to achieve high fluency in Russian, building upon the foundational capabilities of Gemma 2B.
  • Pre-trained Foundation: This is a raw, pre-trained model, intended as a strong base for subsequent fine-tuning tasks.
  • Cross-Linguistic Research: Aims to contribute to research on cross-linguistic capabilities in open-source large language models.
  • Training Details: Pre-trained using a JAX-based framework (EasyLM's fork) on Google's v4-32 TPU, reaching a training loss of approximately 1.5.

Intended Use Cases

  • Further Fine-tuning: Ideal for developers and researchers looking to fine-tune a robust base model for specific Russian-language applications.
  • Russian NLP Development: Serves as a foundational component for creating advanced open-source LLMs fluent in Russian.
  • Research: Suitable for exploring and advancing cross-linguistic understanding and generation in LLMs.