NetherQuartz/tatoeba-tok-multi-gemma-2-2b-merged
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:2.6BQuant:BF16Ctx Length:8kPublished:Sep 20, 2025License:gemmaArchitecture:Transformer Warm

NetherQuartz/tatoeba-tok-multi-gemma-2-2b-merged is a 2.6 billion parameter language model based on the Google Gemma-2-2B architecture. This model is specifically fine-tuned for multilingual applications, focusing on Toki Pona, Russian, English, and Vietnamese. It leverages a custom tokenizer and dataset to enhance performance across these specific languages, making it suitable for tasks requiring proficiency in this unique language set.

Loading preview...

Model Overview

NetherQuartz/tatoeba-tok-multi-gemma-2-2b-merged is a specialized language model built upon the robust Google Gemma-2-2B architecture. With 2.6 billion parameters and an 8192-token context length, this model is designed for efficient multilingual processing.

Key Capabilities

  • Multilingual Proficiency: Fine-tuned to excel in Toki Pona, Russian, English, and Vietnamese.
  • Custom Tokenization: Utilizes a custom tokenizer tailored for the specific linguistic characteristics of its target languages.
  • Specialized Dataset: Trained on the NetherQuartz/tatoeba-tokipona dataset, enhancing its understanding and generation capabilities for Toki Pona and other included languages.
  • Gemma-2-2B Base: Benefits from the foundational strengths and efficiency of the Gemma-2-2B model.

Good For

  • Toki Pona Applications: Ideal for projects involving the minimalist constructed language Toki Pona, including translation, text generation, or analysis.
  • Multilingual Text Processing: Suitable for tasks requiring simultaneous understanding or generation across Russian, English, and Vietnamese, particularly when Toki Pona is also a factor.
  • Research and Development: Useful for researchers exploring multilingual models with unique language combinations and custom tokenization strategies.