ReDiX/Qwen2.5-0.5B-Instruct-ITA

Warm
Public
0.5B
BF16
131072
Dec 2, 2024
License: apache-2.0
Hugging Face
Overview

Overview

ReDiX/Qwen2.5-0.5B-Instruct-ITA is a 0.5 billion parameter instruction-tuned language model, derived from the Qwen/Qwen2.5-0.5B-Instruct base model. It has been specifically fine-tuned by ReDiX using the ReDiX/DataForge dataset to enhance its performance in the Italian language.

Key Capabilities

  • Italian Language Proficiency: The model shows improved performance on Italian-specific benchmarks, indicating better understanding and generation of Italian text.
  • Compact Size: With 0.5 billion parameters, it is a small language model (sLLM), making it suitable for resource-constrained environments or applications where efficiency is critical.
  • Instruction Following: As an instruction-tuned model, it is designed to follow user prompts and instructions effectively.

Performance Metrics

During evaluation, the model achieved the following results on the evaluation set:

  • Loss: 1.4100
  • ARC-IT: 0.2378 (acc), 0.2823 (acc_norm)
  • HellaSwag-IT: 0.3163 (acc), 0.3800 (acc_norm)
  • M-MMLU-IT: 0.381 (acc)

Training Details

The model was trained for 2 epochs with a learning rate of 0.0001, using an AdamW optimizer with 8-bit quantization and a cosine learning rate scheduler. The training utilized a batch size of 4 with 4 gradient accumulation steps, resulting in a total effective batch size of 16. Flash Attention was enabled during training.

Good for

  • Applications requiring a small, efficient language model for Italian text processing.
  • Tasks involving instruction following in Italian.
  • Use cases where enhanced Italian language understanding and generation are prioritized over broad multilingual capabilities.