atsuki-yamaguchi/Qwen2.5-7B-Instruct-my-madlad-mean-tuned

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 22, 2024License:apache-2.0Architecture:Transformer Open Weights Cold

The atsuki-yamaguchi/Qwen2.5-7B-Instruct-my-madlad-mean-tuned model is a 7.6 billion parameter instruction-tuned causal language model, continually pre-trained from Qwen2.5-7B-Instruct. Developed by Atsuki Yamaguchi, it is specifically adapted for the Burmese language, featuring an expanded 10K target vocabulary initialized with mean weights. This model excels in Burmese language processing, making it suitable for applications requiring strong performance in that specific language.

Loading preview...

Qwen2.5 7B Instruct for Burmese: Vocabulary Expansion

This model, developed by Atsuki Yamaguchi, is a specialized adaptation of the Qwen2.5 7B Instruct base model, fine-tuned for the Burmese language. It features a 7.6 billion parameter architecture and supports a substantial 131,072 token context length.

Key Capabilities

  • Burmese Language Specialization: Optimized for processing and generating text in Burmese.
  • Expanded Vocabulary: Includes an additional 10,000 target vocabulary tokens, enhancing its linguistic coverage for Burmese.
  • Mean Initialization: The target weights for the embedding and LM head were initialized using a mean initialization strategy.
  • Continual Pre-training: Underwent continual pre-training on 500 million Burmese language tokens sampled from the MADLAD-400 dataset.

Good for

  • Applications requiring high-quality Burmese language understanding and generation.
  • Research and development in low-resource language NLP, specifically for Burmese.
  • Tasks such as translation, summarization, and conversational AI in Burmese.

For more technical details, refer to the associated paper: Adapting Chat Language Models Using Only Target Unlabeled Language Data.