baban/QwenTranslate_Telugu_English
The baban/QwenTranslate_Telugu_English model is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B-Instruct. It is specifically optimized for machine translation between Telugu and English, leveraging a 32768 token context length. This model is designed for applications requiring accurate translation capabilities between these two languages.
Loading preview...
Overview
baban/QwenTranslate_Telugu_English is a specialized machine translation model, fine-tuned from the Qwen/Qwen2.5-3B-Instruct architecture. With 3.1 billion parameters and a 32768 token context length, this model focuses on translating between Telugu and English.
Key Capabilities
- Telugu-English Machine Translation: The model is specifically trained on the MT_Telugu_En dataset to facilitate translation between these two languages.
- Leverages Qwen2.5-3B-Instruct Base: Benefits from the foundational capabilities of the Qwen 2.5 series, adapted for a specific translation task.
Training Details
The model was trained with a learning rate of 5e-05, a total batch size of 1024 (achieved with train_batch_size 8 and gradient_accumulation_steps 16), and ran for 3 epochs. It utilized an AdamW optimizer and an inverse square root learning rate scheduler. Evaluation during training showed a loss of 1.0262.
Intended Uses
- Applications requiring direct translation between Telugu and English.
- Integration into systems needing to process or generate text in both languages.