HiTZ/latxa-7b-v1.1

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 16, 2024License:llama2Architecture:Transformer0.0K Open Weights Cold

HiTZ/latxa-7b-v1.1 is a 7 billion parameter language model developed by HiTZ Research Center & IXA Research group (University of the Basque Country UPV/EHU), based on Meta's Llama 2 architecture. It is specifically designed and further pre-trained on a 4.2 billion token Basque corpus, making it highly proficient in the Basque language. This model excels in Basque language understanding and generation, outperforming previous open models for low-resource languages, and supports a 4096-token context length.

Loading preview...

Latxa 7B v1.1: A Specialized LLM for Basque

Latxa 7B v1.1 is a 7 billion parameter Large Language Model (LLM) developed by the HiTZ Research Center & IXA Research group, building upon Meta's Llama 2 architecture. This model addresses the performance gap for low-resource languages by undergoing extensive pre-training on a dedicated Basque corpus, comprising 4.3 million documents and 4.2 billion tokens. It significantly outperforms other open models in Basque language tasks and demonstrates competitive language proficiency with larger models like GPT-4 Turbo, particularly in understanding and generation.

Key Capabilities

  • Basque Language Specialization: Optimized for high performance in Basque, trained on a high-quality, carefully filtered corpus.
  • Llama 2 Foundation: Inherits the robust architecture and commercial-friendly Llama 2 license.
  • Reproducible Research: Released alongside its pre-training corpora and evaluation datasets to foster research in low-resource language LLMs.
  • Strong Benchmarking: Achieves an average score of 42.26% across various Basque evaluation benchmarks, including XStoryCloze, Belebele, BasqueGLUE, and EusProficiency, surpassing other 7B models like Mistral and Llama 2.

Good For

  • Basque Language Applications: Ideal for developing applications requiring deep understanding and generation in Basque.
  • Further Fine-tuning: Serves as a strong base for task-specific or instruction fine-tuning for various Basque use cases.
  • Research in Low-Resource NLP: Provides a valuable resource for researchers working on LLMs for languages with limited digital resources.