URajinda/Qwen2.5-MM-1.5B-Base

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kPublished:Dec 26, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

URajinda/Qwen2.5-MM-1.5B-Base is a 1.54 billion parameter language model, adapted from Qwen2.5-1.5B, specifically optimized for the Myanmar language. It features an extended vocabulary with 500+ Myanmar tokens, significantly reducing token count for Myanmar text by 75-85% and enabling 4x faster training speeds for this language. This model is designed as a base for fine-tuning Myanmar language applications, offering improved efficiency and understanding for Myanmar script processing.

Loading preview...

Overview

URajinda/Qwen2.5-MM-1.5B-Base is a specialized 1.54 billion parameter language model built upon the Qwen 2.5 1.5B architecture, developed as part of the "ShweYon" project. Its primary innovation lies in its deep adaptation for the Myanmar language, featuring a custom tokenizer with an extended vocabulary.

Key Features

  • Myanmar Language Optimization: The model incorporates over 500 new Myanmar tokens into its vocabulary, expanding the total vocabulary size to 152,165 tokens from the original 151,665.
  • Tokenization Efficiency: This adaptation leads to a substantial 75-85% reduction in token count when processing Myanmar text, making it significantly more efficient.
  • Accelerated Training: The optimized tokenization results in a remarkable 4x increase in training speed for Myanmar language tasks.
  • Base Model for Fine-tuning: It is designed as a base model, ready for further fine-tuning specifically for Myanmar language applications, while maintaining compatibility with the original Qwen2.5 architecture.

Usage Considerations

This model is particularly beneficial for developers working with Myanmar language data, offering a highly efficient foundation. While it is pre-trained for general language understanding, it still requires further pre-training specifically for the Myanmar language to unlock its full potential in complex applications. The developer emphasizes that redefining the tokenizer for language efficiency with a small, dense vocabulary can be more stable and faster than using generic tokenizers.