Overview
Overview
ReDiX/Qwen2.5-0.5B-Instruct-ITA is a 0.5 billion parameter instruction-tuned language model, derived from the Qwen/Qwen2.5-0.5B-Instruct base model. It has been specifically fine-tuned by ReDiX using the ReDiX/DataForge dataset to enhance its performance in the Italian language.
Key Capabilities
- Italian Language Proficiency: The model shows improved performance on Italian-specific benchmarks, indicating better understanding and generation of Italian text.
- Compact Size: With 0.5 billion parameters, it is a small language model (sLLM), making it suitable for resource-constrained environments or applications where efficiency is critical.
- Instruction Following: As an instruction-tuned model, it is designed to follow user prompts and instructions effectively.
Performance Metrics
During evaluation, the model achieved the following results on the evaluation set:
- Loss: 1.4100
- ARC-IT: 0.2378 (acc), 0.2823 (acc_norm)
- HellaSwag-IT: 0.3163 (acc), 0.3800 (acc_norm)
- M-MMLU-IT: 0.381 (acc)
Training Details
The model was trained for 2 epochs with a learning rate of 0.0001, using an AdamW optimizer with 8-bit quantization and a cosine learning rate scheduler. The training utilized a batch size of 4 with 4 gradient accumulation steps, resulting in a total effective batch size of 16. Flash Attention was enabled during training.
Good for
- Applications requiring a small, efficient language model for Italian text processing.
- Tasks involving instruction following in Italian.
- Use cases where enhanced Italian language understanding and generation are prioritized over broad multilingual capabilities.