tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Jul 22, 2025Architecture:Transformer0.0K Warm

The tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb model is a 3.1 billion parameter instruction-tuned causal language model, developed by TartuNLP, based on Qwen2.5-3B-Instruct. It is specifically adapted for Upper Sorbian (hsb) and Lower Sorbian (dsb) through continued pretraining on Sorbian monolingual and parallel data. This model jointly supports machine translation and question answering for both Sorbian languages, achieving the top rank in the WMT25 Shared Task on Limited Resource Slavic Languages.

Loading preview...

Overview

The tartuNLP/Qwen2.5-3B-Instruct-hsb-dsb model is a specialized 3.1 billion parameter instruction-tuned language model, developed by TartuNLP. It is built upon the Qwen2.5-3B-Instruct architecture and has been extensively adapted for Upper Sorbian (hsb) and Lower Sorbian (dsb). This adaptation involved continued pretraining on Sorbian monolingual and parallel datasets, combined with general instruction-tuning data.

Key Capabilities

  • Bilingual Sorbian Support: Jointly handles both Upper Sorbian and Lower Sorbian.
  • Dual Task Proficiency: Excels in both machine translation (MT) and question answering (QA) for the Sorbian languages.
  • WMT25 Shared Task Winner: Achieved the highest ranking in the WMT25 Shared Task on Limited Resource Slavic Languages for both hsb and dsb tracks, demonstrating strong performance in both MT and QA.

Performance Highlights

In the WMT25 Shared Task, TartuNLP's model secured the top position:

  • Upper Sorbian (hsb): Achieved 86.33 for DE-HSB translation and 58.10 for HSB-QA, leading in QA and tying for translation.
  • Lower Sorbian (dsb): Achieved 78.20 for DE-DSB translation and 57.56 for DSB-QA, leading in QA and tying for translation.

Training Details

The model was trained on approximately 1.2 billion tokens with a sequence length of 4096, utilizing AMD MI250x GPUs on the LUMI supercomputer for about 139 GPU-hours.

Important Note

This model is primarily research-focused and has not been extensively tested for general-purpose usage. Users should exercise caution and conduct their own evaluations for specific applications.