Gervásio 8B PTPT: A Specialized Portuguese Decoder

Gervásio 8B PTPT is an 8 billion parameter open decoder model, developed by the NLX-Natural Language and Speech Group at the University of Lisbon. It is built upon the LLaMA 3.1 8B Instruct architecture, specifically enhanced through additional training on a diverse set of Portuguese language resources.

Key Capabilities & Features

Portuguese Language Specialization: Fine-tuned extensively on datasets native to or carefully translated into European Portuguese, including extraGLUE-Instruct, MMLU PT, Natural Instructions PT, a Wikipedia subset, and Portuguese proverbs.
LLaMA Family Architecture: Benefits from the robust Transformer architecture of the LLaMA 3.1 8B Instruct model.
Accessible Hardware Requirements: Designed to be runnable on consumer-grade hardware, promoting broader accessibility for research and commercial use.
Performance: Demonstrates competitive performance against the base LLaMA 3.1 8B Instruct model on Portuguese benchmarks such as GPQA Diamond PT, MMLU PT, MMLU Pro PT, CoPA, and MRPC.
Open License: Distributed freely for both research and commercial purposes.

Training Details

The model underwent supervised fine-tuning using a causal language modeling objective. A zero-out technique was employed, where only response tokens were back-propagated, while the entire prompt and chat template received attention. Training was accelerated using the Fully Sharded Data Parallel (FSDP) paradigm across 10 L40S GPUs.

Good For

Portuguese Text Generation: Ideal for applications requiring high-quality, contextually relevant text generation in European Portuguese.
Research in Portuguese NLP: Provides a strong baseline for further research and development in Portuguese natural language processing.
Chatbot Integration: Integrated into the Evaristo.ai chatbot, showcasing its generative capabilities in an interactive environment.

Overview

Gervásio 8B PTPT: A Specialized Portuguese Decoder

Key Capabilities & Features

Training Details

Good For

Full Model Card (README)