baban/QwenTranslate_English_Tamil_100K_SFT

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jun 22, 2026License:otherArchitecture:Transformer Cold

The baban/QwenTranslate_English_Tamil_100K_SFT model is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B-Instruct. It is specifically optimized for English-to-Tamil translation tasks, leveraging the MT_En_Tamil dataset. This model provides specialized translation capabilities between English and Tamil, making it suitable for applications requiring accurate bilingual text conversion in these languages. It achieves a validation loss of 0.6514, indicating its proficiency in the targeted translation domain.

Loading preview...

Model Overview

The baban/QwenTranslate_English_Tamil_100K_SFT is a specialized language model, fine-tuned from the Qwen/Qwen2.5-3B-Instruct architecture. With 3.1 billion parameters and a context length of 32768 tokens, this model is engineered for English-to-Tamil translation.

Key Capabilities

  • Dedicated Translation: Optimized for high-quality translation between English and Tamil.
  • Fine-tuned Performance: Achieves a validation loss of 0.6514 on the evaluation set, demonstrating its effectiveness in the target language pair.
  • Base Model: Built upon the robust Qwen2.5-3B-Instruct foundation, inheriting its general language understanding capabilities.

Training Details

The model was trained using the MT_En_Tamil dataset with specific hyperparameters including a learning rate of 5e-05, a total batch size of 1024, and 3 epochs. The training utilized 8 GPUs with gradient accumulation steps of 16, employing the AdamW_Torch optimizer and an inverse_sqrt learning rate scheduler.

Good For

  • Applications requiring accurate and efficient English-to-Tamil text translation.
  • Developers building tools or services for Tamil-speaking audiences or content in Tamil.
  • Research into low-resource language translation, specifically for English-Tamil pairs.