SemanticAlignment/Mistral-v0.1-Italian-LAPT-instruct
Mistral-v0.1-Italian-LAPT-instruct is a 7 billion parameter instruction-tuned causal language model developed by SapienzaNLP, ISTI-CNR, and ILC-CNR. Based on the Mistral-7B-v0.1 architecture, it has been continually trained and instruction-tuned primarily on Italian and English data, with a focus on Italian. This model excels in Italian language understanding and generation, demonstrating improved performance on Italian benchmarks compared to its base model.
Loading preview...
Mistral-v0.1-Italian-LAPT-instruct Overview
This model is part of the Mistral-7B-v0.1-Adapted collection, a series of 7B generative models derived from Mistral-7B-Base-v0.1. Developed by SapienzaNLP, ISTI-CNR, and ILC-CNR, this specific variant has undergone continuous training and instruction tuning to enhance its capabilities, particularly for the Italian language.
Key Adaptations and Training
The model's adaptation involved training on a custom dataset skewed towards Italian, comprising 9 billion tokens from the Italian part of CulturaX and 3 billion English tokens from the same source. For instruction tuning, a diverse mix of datasets was used, including TÜLU-v3, LIMA, WildChat-IT, TowerBlocks-v0.2, GPT-4o-ITA-Instruct, and Aya, with a significant portion being Italian-centric.
Performance and Use Cases
Evaluated on ITA-Bench, Mistral-0.1-LAPT shows competitive performance in Italian language tasks. For instance, it achieves 52.9 on MMLU (5-shots) and 58.4 on Hellaswag (0-shots), outperforming the original Mistral-0.1 model in these metrics. This model is well-suited for applications requiring robust Italian language understanding and generation, such as chatbots, content creation, and translation assistance in Italian contexts.