Zefiro-7b-dpo-ITA: A DPO Fine-Tuned Italian LLM
Zefiro-7b-dpo-ITA is a 7 billion parameter GPT-like model developed by giux78, specifically fine-tuned for the Italian language using Direct Preference Optimization (DPO). It builds upon the Zefiro-7b-sft-ITA model and draws inspiration from the Zephyr and LLaMAntino models.
Key Capabilities & Training:
- Italian Language Specialization: Primarily focused on Italian, making it highly effective for Italian NLP tasks.
- DPO Fine-Tuning: Utilizes DPO on a filtered version of the ultrafeedback-preferences-ITA dataset, enhancing its conversational abilities.
- Performance: Achieves competitive scores on Italian benchmarks, with an average of 56.86 across Arc-c, HellaS, and MMUL, outperforming its base and SFT versions.
- Training Data: Trained using a translated version of the UltraChat dataset, with careful consideration for translation quality.
Intended Uses:
- Conversational AI: Ideal as a base model for developing more specific conversational agents in Italian.
- Italian NLP Applications: Suitable for various tasks requiring strong Italian language understanding and generation.
Limitations:
- The model has not undergone human preference alignment for safety beyond the DPO phase, and thus may produce problematic outputs if prompted to do so.
- The exact composition of the base model's training corpus is unknown, but likely includes web data and technical sources.