Latxa 7B: A Specialized LLM for Basque
Latxa 7B is a 7 billion parameter Large Language Model developed by the HiTZ Research Center & IXA Research group, specifically designed to address the limitations of general LLMs for low-resource languages like Basque. Built upon Meta's LLaMA 2 architecture, this model has undergone further training with Euscrawl, a highly curated Basque corpus, to achieve superior performance in Basque language tasks.
Key Capabilities & Features
- Basque Language Specialization: Significantly outperforms general LLMs (like LLaMA 2 7B, BLOOM 7B, XGLM 7B) on various Basque-specific benchmarks, including reading comprehension, commonsense reasoning, sentiment analysis, and topic classification.
- Foundation Model: Released as a pre-trained LLM, suitable for direct prompting or further fine-tuning for specific Basque-centric use cases.
- Multilingual Support: Primarily focused on Basque (
eu), with some English (en) data included during training to prevent catastrophic forgetting. - Reproducibility: This specific version (v1) is provided for reproducibility, with newer versions available in the Latxa Collection.
Ideal Use Cases
- Developing applications and research for the Basque language.
- Tasks requiring high accuracy in Basque text generation, understanding, and analysis.
- Fine-tuning for specific downstream applications in Basque, such as chatbots, content creation, or information extraction.