atsuki-yamaguchi/Qwen2.5-7B-Instruct-am-madlad-mean-tuned
The atsuki-yamaguchi/Qwen2.5-7B-Instruct-am-madlad-mean-tuned model is a 7.6 billion parameter instruction-tuned language model based on Qwen2.5-7B-Instruct, specifically adapted for Amharic. It features an expanded vocabulary of 10,000 additional target language tokens, initialized using mean initialization. This model was continually pre-trained on 500 million Amharic tokens sampled from the MADLAD-400 dataset, making it specialized for Amharic language processing tasks.
Loading preview...
Overview
This model, atsuki-yamaguchi/Qwen2.5-7B-Instruct-am-madlad-mean-tuned, is a specialized version of the Qwen2.5-7B-Instruct base model, fine-tuned for the Amharic language. It incorporates a significant vocabulary expansion, adding 10,000 target language tokens, with their embedding and LM head weights initialized using a mean initialization strategy.
Key Capabilities
- Amharic Language Specialization: Continually pre-trained on 500 million Amharic tokens from the MADLAD-400 dataset, enhancing its proficiency in Amharic.
- Expanded Vocabulary: Features an additional 10,000 target vocabulary tokens, specifically for Amharic, to improve language representation.
- Instruction-Tuned Base: Built upon the Qwen2.5-7B-Instruct architecture, retaining its instruction-following capabilities.
Training Details
The model underwent continuous pre-training using a substantial corpus of Amharic language data. The target vocabulary initialization method involved mean initialization for the embedding and LM head weights, as detailed in the associated paper.
Good For
- Applications requiring robust Amharic language understanding and generation.
- Research into vocabulary expansion and adaptation techniques for low-resource languages.
- Developers looking for an instruction-tuned model with enhanced Amharic linguistic capabilities.