rinna/gemma-2-baku-2b

Warm
Public
2.6B
BF16
8192
License: gemma
Hugging Face
Overview

Overview

rinna/gemma-2-baku-2b is a 2.6 billion parameter language model developed by rinna, based on Google's Gemma 2 architecture. It underwent continual pre-training on approximately 80 billion tokens from a diverse mixture of Japanese and English datasets, including Japanese CC-100, Japanese C4, Japanese OSCAR, The Pile, Wikipedia, and rinna's curated Japanese dataset. This extensive pre-training significantly improves its capabilities for Japanese language processing.

Key Capabilities

  • Enhanced Japanese Language Performance: Optimized through continual pre-training on substantial Japanese corpora.
  • Gemma 2 Architecture: Leverages the robust 26-layer, 2304-hidden-size transformer architecture of Gemma 2.
  • 8192-Token Context Length: Supports processing of longer input sequences.
  • Original Gemma 2 Tokenizer: Ensures compatibility and consistent tokenization with the base Gemma 2 models.

Good For

  • Japanese NLP Applications: Ideal for tasks requiring strong understanding and generation in Japanese.
  • Research and Development: A solid base model for further fine-tuning on specific Japanese-centric tasks.
  • Multilingual Contexts: Benefits from its mixed Japanese and English training data, offering potential for bilingual applications.