raalr/qwen3-1.7b-arabic-standard-kd
TEXT GENERATIONConcurrency Cost:1Model Size:2BQuant:BF16Ctx Length:32kPublished:Mar 29, 2026License:apache-2.0Architecture:Transformer Open Weights Cold

The raalr/qwen3-1.7b-arabic-standard-kd model is a 2 billion parameter language model, fine-tuned from Qwen/Qwen3-1.7B-Base. This model is specifically adapted for Arabic language tasks, leveraging knowledge distillation techniques. It is intended for applications requiring a compact yet capable model for standard Arabic text processing, offering a context length of 32768 tokens.

Loading preview...

Model Overview

The raalr/qwen3-1.7b-arabic-standard-kd is a 2 billion parameter language model derived from the Qwen3-1.7B-Base architecture. This model has undergone fine-tuning, though the specific dataset used for this process is not detailed in the available information. It is designed to handle standard Arabic text, likely benefiting from knowledge distillation (kd) to maintain performance in a smaller footprint.

Training Details

The model was trained using a learning rate of 2e-05 over 3 epochs, with a total training batch size of 16 (achieved with a train_batch_size of 2 and gradient_accumulation_steps of 8). The optimizer used was ADAMW_TORCH with standard betas and epsilon. A cosine learning rate scheduler with 0.05 warmup steps was employed. During training, the validation loss decreased from an initial 2.6974 to a final reported loss of 2.0547.

Framework Versions

The training process utilized:

  • Transformers 5.4.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.8.4
  • Tokenizers 0.22.2

Intended Uses & Limitations

While specific intended uses and limitations are not explicitly detailed, given its base model and fine-tuning, it is likely suitable for various Arabic natural language processing tasks. Users should be aware that detailed information regarding its specific capabilities, performance benchmarks, and potential biases or limitations is not provided in the current model card.