Name: HiTZ/latxa-7b-v1.2 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: HiTZ

Latxa: An Open Language Model for Basque

Latxa is a family of large language models developed by the HiTZ Research Center & IXA Research group, specifically designed to address the limitations of existing LLMs for low-resource languages like Basque. This particular model, latxa-7b-v1.2, is a 7 billion parameter variant based on Meta's Llama 2 architecture.

Key Capabilities & Features

Basque Language Specialization: Continuously pre-trained on a new, high-quality Basque corpus comprising 4.3 million documents and 4.2 billion tokens, enabling strong performance in Basque.
Performance: Outperforms all previous open models for Basque by a significant margin and demonstrates competitive language proficiency and understanding compared to GPT-4 Turbo in Basque-specific tasks.
Architecture: Inherits the Llama 2 architecture, providing a robust foundation.
Open Availability: Both the Latxa models and the new pretraining corpora and evaluation datasets are publicly available under open licenses, fostering research in low-resource language LLMs.
Multilingual Context: While primarily focused on Basque, the training data also included 500K English documents from the Pile dataset to prevent catastrophic forgetting.

Intended Use Cases

Basque Language Processing: Ideal for tasks requiring deep understanding and generation in Basque.
Further Fine-tuning: As a pre-trained LLM, it is suitable for further fine-tuning on specific Basque-centric applications or tasks.
Research on Low-Resource Languages: Provides a valuable resource for researchers exploring methods to build LLMs for languages with limited digital resources.

Limitations

Language Specificity: Performance is not guaranteed for languages other than Basque.
No Instruction Fine-tuning: The model is pre-trained and not instruction-tuned or designed as a chat assistant, so direct instruction-following or conversational use is not recommended without further fine-tuning.