rinna/gemma-2-baku-2b

Warm
Public
2.6B
BF16
8192
1
Oct 1, 2024
License: gemma
Hugging Face

The rinna/gemma-2-baku-2b is a 2.6 billion parameter transformer-based language model, continually pre-trained by rinna on 80 billion tokens of mixed Japanese and English datasets. Building upon Google's Gemma 2 architecture, this model is specifically optimized to enhance performance on Japanese language tasks. It maintains an 8192-token context length and utilizes the original Gemma 2 tokenizer, making it suitable for applications requiring strong Japanese language understanding and generation.

Overview

Overview

rinna/gemma-2-baku-2b is a 2.6 billion parameter language model developed by rinna, based on Google's Gemma 2 architecture. It underwent continual pre-training on approximately 80 billion tokens from a diverse mixture of Japanese and English datasets, including Japanese CC-100, Japanese C4, Japanese OSCAR, The Pile, Wikipedia, and rinna's curated Japanese dataset. This extensive pre-training significantly improves its capabilities for Japanese language processing.

Key Capabilities

  • Enhanced Japanese Language Performance: Optimized through continual pre-training on substantial Japanese corpora.
  • Gemma 2 Architecture: Leverages the robust 26-layer, 2304-hidden-size transformer architecture of Gemma 2.
  • 8192-Token Context Length: Supports processing of longer input sequences.
  • Original Gemma 2 Tokenizer: Ensures compatibility and consistent tokenization with the base Gemma 2 models.

Good For

  • Japanese NLP Applications: Ideal for tasks requiring strong understanding and generation in Japanese.
  • Research and Development: A solid base model for further fine-tuning on specific Japanese-centric tasks.
  • Multilingual Contexts: Benefits from its mixed Japanese and English training data, offering potential for bilingual applications.