baban/QwenTranslate_English_Hindi_100K_SFT
The baban/QwenTranslate_English_Hindi_100K_SFT model is a 3.1 billion parameter, 32768-token context length language model fine-tuned from Qwen/Qwen2.5-3B-Instruct. Developed by baban, this model is specifically optimized for English-to-Hindi translation tasks, leveraging the MT_En_Hindi dataset. It is designed to provide specialized translation capabilities between these two languages.
Loading preview...
Model Overview
The baban/QwenTranslate_English_Hindi_100K_SFT is a specialized language model, fine-tuned from the Qwen/Qwen2.5-3B-Instruct architecture. With 3.1 billion parameters and a context length of 32768 tokens, this model is specifically adapted for translation tasks between English and Hindi.
Key Capabilities
- English-Hindi Translation: The model's primary function is to translate text from English to Hindi, having been fine-tuned on the
MT_En_Hindidataset. - Qwen2.5-3B-Instruct Base: Benefits from the foundational capabilities of the Qwen2.5-3B-Instruct model, providing a robust base for its specialized translation function.
Training Details
The model was trained with a learning rate of 5e-05, a total batch size of 1024 (across 8 GPUs with 32 gradient accumulation steps), and an inverse square root learning rate scheduler. Training was conducted for 3 epochs, achieving a validation loss of 0.4726.
Intended Use Cases
This model is best suited for applications requiring accurate and efficient translation of text from English to Hindi, particularly in scenarios where a specialized, fine-tuned model can outperform more general-purpose language models for this specific language pair.