baban/QwenTranslate_English_Bengali
The baban/QwenTranslate_English_Bengali model is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B-Instruct. It is specifically optimized for English-Bengali machine translation tasks. This model leverages a 32768 token context length, making it suitable for processing longer sequences in translation. Its primary strength lies in facilitating translation between English and Bengali.
Loading preview...
Overview
The baban/QwenTranslate_English_Bengali model is a specialized language model derived from the Qwen/Qwen2.5-3B-Instruct architecture. With 3.1 billion parameters and a substantial context length of 32768 tokens, this model has been fine-tuned on the MT_En_Bengali dataset.
Key Capabilities
- English-Bengali Translation: The model's primary function is to translate text between English and Bengali, having been specifically optimized for this language pair.
- Large Context Window: Benefits from a 32768 token context length, which can be advantageous for translating longer sentences or paragraphs while maintaining coherence.
Training Details
Training involved a learning rate of 5e-05, a total batch size of 1024 (achieved with 8 devices and 16 gradient accumulation steps), and 3 epochs. The training process utilized the AdamW_TORCH optimizer and an inverse_sqrt learning rate scheduler. Evaluation on the training set showed a loss of 0.3472.
Intended Use
This model is intended for applications requiring machine translation between English and Bengali. Its fine-tuned nature suggests improved performance for this specific task compared to general-purpose models.