Model Overview
proxectonos/Llama-3.1-Carballo-Instr3 is an 8-billion parameter causal language model, continually pretrained from Meta's Llama-3.1-8B. Its development is part of research presented in the paper "Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study," accepted at ACL Findings 2025. The model focuses on enhancing performance for low-resource languages, specifically Galician, while maintaining proficiency in related languages.
Key Capabilities
- Multilingual Proficiency: Supports Galician, Portuguese, Spanish, English, and Catalan, with a particular emphasis on Galician.
- Continual Pretraining: Built upon Llama-3.1-8B, it underwent further pretraining with a 340 million token multilingual corpus, heavily weighted towards Galician (74% of the base corpus).
- Causal Language Modeling: Ready-to-use for text generation tasks and adaptable for fine-tuning in specific scenarios.
- Research-Backed Development: Emerged from experiments exploring Continued Pretraining (CPT) strategies for underrepresented languages.
Good For
- Galician Language Applications: Ideal for text generation and language understanding tasks in Galician.
- Multilingual Text Generation: Suitable for generating text across its supported languages, especially where Galician context is important.
- Research in Low-Resource NLP: A valuable resource for researchers studying CPT and evaluation methodologies for less-resourced languages.