baban/QwenTranslate_English_German
The baban/QwenTranslate_English_German model is a 3.1 billion parameter language model fine-tuned from Qwen/Qwen2.5-3B-Instruct. Developed by baban, this model specializes in English to German machine translation. It leverages a 32768-token context length, making it suitable for translation tasks requiring substantial input. The model's primary strength lies in its optimized performance for English-German translation, as indicated by its training on the MT_En_German dataset.
Loading preview...
Overview
The baban/QwenTranslate_English_German model is a specialized machine translation model, fine-tuned from the Qwen/Qwen2.5-3B-Instruct base model. With 3.1 billion parameters and a substantial 32768-token context length, it is designed for English to German translation tasks.
Key Capabilities
- English to German Translation: The model is specifically fine-tuned on the MT_En_German dataset, indicating its primary capability in translating text from English to German.
- Qwen2.5 Architecture: Built upon the Qwen2.5-3B-Instruct foundation, it inherits the general language understanding and generation capabilities of its base model, adapted for translation.
- Large Context Window: A 32768-token context length allows for processing longer texts, which can be beneficial for maintaining coherence and accuracy in translation.
Training Details
The model was trained with a learning rate of 5e-05 over 3 epochs, utilizing a distributed training setup across 8 devices. The training process involved an AdamW optimizer and an inverse square root learning rate scheduler. Evaluation during training showed a loss of 0.7843.
Intended Uses
This model is best suited for applications requiring direct English to German text translation. Its fine-tuned nature suggests improved performance for this specific language pair compared to general-purpose models.