Overview
Latxa 3.1 8B Instruct: Basque Language Adaptation
Latxa 3.1 8B Instruct is an instruction-tuned large language model developed by the HiTZ Research Center & IXA Research group, building upon Meta's Llama-3.1-8B-Instruct. This model addresses the performance gap for low-resource languages by undergoing extensive language adaptation using a 4.2 billion token Basque corpus (Etxaniz et al., 2024).
Key Capabilities
- Basque Language Proficiency: Demonstrates substantial performance improvements over Llama-3.1-Instruct on standard Basque benchmarks, particularly in chat conversations.
- Instruction Following: Designed to follow instructions and function effectively as a chat assistant in Basque.
- Competitive Performance: Preliminary evaluations, including an arena-based assessment, show Latxa 3.1 8B Instruct ranking highly against other models, including proprietary ones like GPT-4o and Claude Sonnet, for Basque tasks.
Evaluation Highlights
Latxa 3.1 8B Instruct shows significant gains across various Basque datasets compared to Llama-3.1 8B Instruct:
- Belebele: 80.00% accuracy (vs. 73.89% for Llama-3.1 8B Instruct)
- X-Story Cloze: 71.34% accuracy (vs. 61.22%)
- EusProficiency: 52.83% accuracy (vs. 34.13%)
- EusReading: 62.78% accuracy (vs. 49.72%)
- EusTrivia: 61.05% accuracy (vs. 45.01%)
- EusExams: 56.00% accuracy (vs. 46.21%)
Good for
- Applications requiring high-quality language generation and understanding in Basque.
- Developing chatbots and conversational AI systems for Basque speakers.
- Research and development in low-resource language NLP, specifically for Basque.