anakin87/Llama-3-8b-ita-ties-pro
anakin87/Llama-3-8b-ita-ties-pro is an 8 billion parameter language model based on the Llama 3 architecture, created by anakin87 using the TIES merge method. It combines two Italian LLMs, DeepMount00/Llama-3-8b-Ita and swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA, with Meta-Llama-3-8B-Instruct as its base. This model is specifically designed and optimized for Italian language tasks, offering a context length of 8192 tokens.
Loading preview...
Model Overview
anakin87/Llama-3-8b-ita-ties-pro is an 8 billion parameter language model developed by anakin87. It was created using the TIES merge method, combining two specialized Italian language models: DeepMount00/Llama-3-8b-Ita and swap-uniba/LLaMAntino-3-ANITA-8B-Inst-DPO-ITA, with Meta-Llama-3-8B-Instruct serving as the foundational base model.
Key Characteristics
- Architecture: Llama 3 family, 8 billion parameters.
- Merge Method: Utilizes the TIES (Trimmed, Iterative, and Self-consistent) merging technique, which is designed to combine the strengths of multiple pre-trained models.
- Italian Language Focus: Specifically engineered by merging models known for their performance in Italian, aiming to enhance capabilities for Italian-centric applications.
- Context Length: Supports an 8192-token context window.
Performance Metrics
Evaluations indicate competitive performance for Italian language tasks, with an average accuracy of 0.6110 across various benchmarks. Specific scores include:
hellaswag_it acc_norm: 0.6967arc_it acc_norm: 0.5646m_mmlu_it 5-shot acc: 0.5717
For a comprehensive comparison, users can refer to the Leaderboard for Italian Language Models.
Use Cases
This model is particularly suitable for applications requiring strong performance in the Italian language, such as:
- Content generation in Italian.
- Italian text summarization and analysis.
- Chatbots or conversational AI systems interacting in Italian.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.