ReDiX/Qwen2.5-0.5B-Instruct-ITA
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Dec 2, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

ReDiX/Qwen2.5-0.5B-Instruct-ITA is a 0.5 billion parameter instruction-tuned causal language model, fine-tuned by ReDiX from Qwen/Qwen2.5-0.5B-Instruct. This model specializes in Italian language tasks, demonstrating improved performance on Italian evaluation benchmarks such as ARC-IT, HellaSwag-IT, and M-MMLU-IT. It is designed for applications requiring a compact, efficient model with enhanced Italian language understanding and generation capabilities.

Loading preview...

Overview

ReDiX/Qwen2.5-0.5B-Instruct-ITA is a 0.5 billion parameter instruction-tuned language model, derived from the Qwen/Qwen2.5-0.5B-Instruct base model. It has been specifically fine-tuned by ReDiX using the ReDiX/DataForge dataset to enhance its performance in the Italian language.

Key Capabilities

  • Italian Language Proficiency: The model shows improved performance on Italian-specific benchmarks, indicating better understanding and generation of Italian text.
  • Compact Size: With 0.5 billion parameters, it is a small language model (sLLM), making it suitable for resource-constrained environments or applications where efficiency is critical.
  • Instruction Following: As an instruction-tuned model, it is designed to follow user prompts and instructions effectively.

Performance Metrics

During evaluation, the model achieved the following results on the evaluation set:

  • Loss: 1.4100
  • ARC-IT: 0.2378 (acc), 0.2823 (acc_norm)
  • HellaSwag-IT: 0.3163 (acc), 0.3800 (acc_norm)
  • M-MMLU-IT: 0.381 (acc)

Training Details

The model was trained for 2 epochs with a learning rate of 0.0001, using an AdamW optimizer with 8-bit quantization and a cosine learning rate scheduler. The training utilized a batch size of 4 with 4 gradient accumulation steps, resulting in a total effective batch size of 16. Flash Attention was enabled during training.

Good for

  • Applications requiring a small, efficient language model for Italian text processing.
  • Tasks involving instruction following in Italian.
  • Use cases where enhanced Italian language understanding and generation are prioritized over broad multilingual capabilities.