baban/QwenTranslate_Hindi_English

TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Aug 12, 2025License:otherArchitecture:Transformer Cold

The baban/QwenTranslate_Hindi_English model is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B-Instruct, specifically optimized for Hindi-English machine translation tasks. This model leverages a 32768-token context length and was trained with a focus on translation accuracy, achieving a loss of 0.9161 on its evaluation set. It is designed for applications requiring robust translation capabilities between Hindi and English.

Loading preview...

Model Overview

The baban/QwenTranslate_Hindi_English model is a specialized machine translation model, fine-tuned from the Qwen/Qwen2.5-3B-Instruct base architecture. With 3.1 billion parameters and a substantial 32768-token context length, this model is engineered for high-quality translation between Hindi and English.

Key Capabilities

  • Hindi-English Machine Translation: The primary function of this model is to translate text between Hindi and English, leveraging its fine-tuned training on the MT_Hindi_En dataset.
  • Qwen2.5-3B-Instruct Foundation: Built upon the robust Qwen2.5-3B-Instruct model, it inherits strong language understanding and generation capabilities, adapted for translation.
  • Optimized Training: The model was trained using specific hyperparameters, including a learning rate of 5e-05, a total batch size of 1024, and 3 epochs, resulting in an evaluation loss of 0.9161.

Intended Use Cases

This model is particularly well-suited for applications requiring accurate and efficient translation between Hindi and English. Potential use cases include:

  • Document Translation: Translating documents, articles, or web content between the two languages.
  • Cross-Lingual Communication: Facilitating communication in environments where both Hindi and English are spoken.
  • Content Localization: Adapting content for Hindi or English-speaking audiences.

Training Details

The model was trained using Transformers 4.55.0, Pytorch 2.5.1+cu124, Datasets 3.6.0, and Tokenizers 0.21.1. The training involved a multi-GPU setup with 8 devices and a gradient accumulation of 16 steps.