NetherQuartz/tatoeba-tok-multi-gemma-2-2b-merged
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:Sep 20, 2025License:gemmaArchitecture:Transformer Warm
NetherQuartz/tatoeba-tok-multi-gemma-2-2b-merged is a 2.6 billion parameter language model based on the Google Gemma-2-2B architecture. This model is specifically fine-tuned for multilingual applications, focusing on Toki Pona, Russian, English, and Vietnamese. It leverages a custom tokenizer and dataset to enhance performance across these specific languages, making it suitable for tasks requiring proficiency in this unique language set.
Loading preview...
Model Overview
NetherQuartz/tatoeba-tok-multi-gemma-2-2b-merged is a specialized language model built upon the robust Google Gemma-2-2B architecture. With 2.6 billion parameters and an 8192-token context length, this model is designed for efficient multilingual processing.
Key Capabilities
- Multilingual Proficiency: Fine-tuned to excel in Toki Pona, Russian, English, and Vietnamese.
- Custom Tokenization: Utilizes a custom tokenizer tailored for the specific linguistic characteristics of its target languages.
- Specialized Dataset: Trained on the NetherQuartz/tatoeba-tokipona dataset, enhancing its understanding and generation capabilities for Toki Pona and other included languages.
- Gemma-2-2B Base: Benefits from the foundational strengths and efficiency of the Gemma-2-2B model.
Good For
- Toki Pona Applications: Ideal for projects involving the minimalist constructed language Toki Pona, including translation, text generation, or analysis.
- Multilingual Text Processing: Suitable for tasks requiring simultaneous understanding or generation across Russian, English, and Vietnamese, particularly when Toki Pona is also a factor.
- Research and Development: Useful for researchers exploring multilingual models with unique language combinations and custom tokenization strategies.