Gervásio 8B PTPT: A Specialized Portuguese Decoder
Gervásio 8B PTPT is an 8 billion parameter open decoder model, developed by the NLX-Natural Language and Speech Group at the University of Lisbon. It is built upon the LLaMA 3.1 8B Instruct architecture, specifically enhanced through additional training on a diverse set of Portuguese language resources.
Key Capabilities & Features
- Portuguese Language Specialization: Fine-tuned extensively on datasets native to or carefully translated into European Portuguese, including extraGLUE-Instruct, MMLU PT, Natural Instructions PT, a Wikipedia subset, and Portuguese proverbs.
- LLaMA Family Architecture: Benefits from the robust Transformer architecture of the LLaMA 3.1 8B Instruct model.
- Accessible Hardware Requirements: Designed to be runnable on consumer-grade hardware, promoting broader accessibility for research and commercial use.
- Performance: Demonstrates competitive performance against the base LLaMA 3.1 8B Instruct model on Portuguese benchmarks such as GPQA Diamond PT, MMLU PT, MMLU Pro PT, CoPA, and MRPC.
- Open License: Distributed freely for both research and commercial purposes.
Training Details
The model underwent supervised fine-tuning using a causal language modeling objective. A zero-out technique was employed, where only response tokens were back-propagated, while the entire prompt and chat template received attention. Training was accelerated using the Fully Sharded Data Parallel (FSDP) paradigm across 10 L40S GPUs.
Good For
- Portuguese Text Generation: Ideal for applications requiring high-quality, contextually relevant text generation in European Portuguese.
- Research in Portuguese NLP: Provides a strong baseline for further research and development in Portuguese natural language processing.
- Chatbot Integration: Integrated into the Evaristo.ai chatbot, showcasing its generative capabilities in an interactive environment.