HiTZ/latxa-70b-v1.2
Latxa-70b-v1.2 is a 69 billion parameter large language model developed by HiTZ Research Center & IXA Research group, based on Meta's Llama 2 architecture. It is specifically pretrained on a 4.2 billion token Basque corpus, making it highly proficient in the Basque language. This model excels in Basque language understanding and generation, outperforming other open models in various Basque-specific benchmarks.
Loading preview...
Latxa-70b-v1.2: A Specialized LLM for Basque
Latxa-70b-v1.2 is a 69 billion parameter large language model developed by the HiTZ Research Center & IXA Research group, building upon Meta's Llama 2 architecture. This model was specifically designed to address the performance gap for low-resource languages like Basque in the LLM landscape. It underwent continued pretraining on the high-quality Latxa Corpus v1.1, comprising 4.3 million documents and 4.2 billion tokens of Basque data, with an additional 500K English documents from the Pile to prevent catastrophic forgetting.
Key Capabilities
- Exceptional Basque Language Proficiency: Latxa-70b-v1.2 significantly outperforms previous open models in Basque language tasks, demonstrating strong understanding and generation capabilities.
- Competitive with GPT-4 Turbo: Achieves competitive results with GPT-4 Turbo in Basque language proficiency and understanding, though it lags in reading comprehension and knowledge-intensive tasks.
- Pre-trained LLM: Functions as a pre-trained language model, suitable for direct prompting or further fine-tuning for specific Basque-centric applications.
Good for
- Basque Language Applications: Ideal for any use case requiring high performance in the Basque language, including text generation, analysis, and understanding.
- Research and Development in Low-Resource Languages: Promotes research and technological development for the Basque language and offers insights for other low-resource languages.
It is important to note that Latxa models are not instruction-tuned or designed as chat assistants, and their performance is not guaranteed for languages other than Basque.