cerbero-7b: A Dedicated Italian LLM
cerbero-7b is a 7 billion parameter, fully fine-tuned Italian Large Language Model developed by galatolo. It is built on the robust Mistral-7B base model and trained on the specialized Cerbero Dataset, which was generated using an innovative dynamic self-chat method. A notable variant, cerbero-7b-openchat, is also available, based on openchat3.5, and claims performance on par with or superior to ChatGPT 3.5.
Key Capabilities & Performance
- Superior Italian Language Proficiency: Significantly outperforms other Italian LLMs like Fauno and Camoscio on Italian-specific benchmarks.
- Question Answering: Achieves 72.55% F1 score and 55.6% Exact Match on SQuAD-it.
- EVALITA Benchmarks: Demonstrates strong performance in Toxicity Detection (63.04% F1), Irony Detection (48.51% F1), and Sentiment Analysis (61.80% F1).
- Full Fine-tuning: Unlike LORA or QLORA, cerbero-7b is fully fine-tuned, trained on an expansive Italian LLM using synthetic datasets with an 8192-token context window.
- Permissive Licensing: Released under the Apache 2.0 license, allowing unrestricted commercial and research usage.
Use Cases & Integration
- Italian AI Applications: Ideal for developing advanced AI solutions that require deep understanding and generation of Italian text.
- Research and Commercial Projects: Suitable for both academic research and deployment in commercial products due to its open license.
- Easy Integration: Compatible with Hugging Face Transformers and
llama.cpp for flexible deployment, including quantized GGUF versions for resource-constrained environments.
For more technical details, refer to the research paper on arXiv.