baban/QwenTranslate_Hindi_English
The baban/QwenTranslate_Hindi_English model is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B-Instruct, specifically optimized for Hindi-English machine translation tasks. This model leverages a 32768-token context length and was trained with a focus on translation accuracy, achieving a loss of 0.9161 on its evaluation set. It is designed for applications requiring robust translation capabilities between Hindi and English.
Loading preview...
Model Overview
The baban/QwenTranslate_Hindi_English model is a specialized machine translation model, fine-tuned from the Qwen/Qwen2.5-3B-Instruct base architecture. With 3.1 billion parameters and a substantial 32768-token context length, this model is engineered for high-quality translation between Hindi and English.
Key Capabilities
- Hindi-English Machine Translation: The primary function of this model is to translate text between Hindi and English, leveraging its fine-tuned training on the MT_Hindi_En dataset.
- Qwen2.5-3B-Instruct Foundation: Built upon the robust Qwen2.5-3B-Instruct model, it inherits strong language understanding and generation capabilities, adapted for translation.
- Optimized Training: The model was trained using specific hyperparameters, including a learning rate of 5e-05, a total batch size of 1024, and 3 epochs, resulting in an evaluation loss of 0.9161.
Intended Use Cases
This model is particularly well-suited for applications requiring accurate and efficient translation between Hindi and English. Potential use cases include:
- Document Translation: Translating documents, articles, or web content between the two languages.
- Cross-Lingual Communication: Facilitating communication in environments where both Hindi and English are spoken.
- Content Localization: Adapting content for Hindi or English-speaking audiences.
Training Details
The model was trained using Transformers 4.55.0, Pytorch 2.5.1+cu124, Datasets 3.6.0, and Tokenizers 0.21.1. The training involved a multi-GPU setup with 8 devices and a gradient accumulation of 16 steps.