Overview
Overview
rinna/gemma-2-baku-2b is a 2.6 billion parameter language model developed by rinna, based on Google's Gemma 2 architecture. It underwent continual pre-training on approximately 80 billion tokens from a diverse mixture of Japanese and English datasets, including Japanese CC-100, Japanese C4, Japanese OSCAR, The Pile, Wikipedia, and rinna's curated Japanese dataset. This extensive pre-training significantly improves its capabilities for Japanese language processing.
Key Capabilities
- Enhanced Japanese Language Performance: Optimized through continual pre-training on substantial Japanese corpora.
- Gemma 2 Architecture: Leverages the robust 26-layer, 2304-hidden-size transformer architecture of Gemma 2.
- 8192-Token Context Length: Supports processing of longer input sequences.
- Original Gemma 2 Tokenizer: Ensures compatibility and consistent tokenization with the base Gemma 2 models.
Good For
- Japanese NLP Applications: Ideal for tasks requiring strong understanding and generation in Japanese.
- Research and Development: A solid base model for further fine-tuning on specific Japanese-centric tasks.
- Multilingual Contexts: Benefits from its mixed Japanese and English training data, offering potential for bilingual applications.