PORTULAN/gervasio-8b-portuguese-ptpt-decoder
PORTULAN's Gervásio 8B PTPT is an 8 billion parameter decoder model, part of the LLaMA family, specifically fine-tuned for the Portuguese language (Portugal variant). Developed by the NLX-Natural Language and Speech Group at the University of Lisbon, it leverages the LLaMA 3.1 8B Instruct architecture. This model is optimized for generative tasks in Portuguese and can be run on consumer-grade hardware, making it suitable for various research and commercial applications requiring high-quality Portuguese text generation.
Loading preview...
Gervásio 8B PTPT: A Specialized Portuguese Decoder
Gervásio 8B PTPT is an 8 billion parameter open decoder model, developed by the NLX-Natural Language and Speech Group at the University of Lisbon. It is built upon the LLaMA 3.1 8B Instruct architecture, specifically enhanced through additional training on a diverse set of Portuguese language resources.
Key Capabilities & Features
- Portuguese Language Specialization: Fine-tuned extensively on datasets native to or carefully translated into European Portuguese, including extraGLUE-Instruct, MMLU PT, Natural Instructions PT, a Wikipedia subset, and Portuguese proverbs.
- LLaMA Family Architecture: Benefits from the robust Transformer architecture of the LLaMA 3.1 8B Instruct model.
- Accessible Hardware Requirements: Designed to be runnable on consumer-grade hardware, promoting broader accessibility for research and commercial use.
- Performance: Demonstrates competitive performance against the base LLaMA 3.1 8B Instruct model on Portuguese benchmarks such as GPQA Diamond PT, MMLU PT, MMLU Pro PT, CoPA, and MRPC.
- Open License: Distributed freely for both research and commercial purposes.
Training Details
The model underwent supervised fine-tuning using a causal language modeling objective. A zero-out technique was employed, where only response tokens were back-propagated, while the entire prompt and chat template received attention. Training was accelerated using the Fully Sharded Data Parallel (FSDP) paradigm across 10 L40S GPUs.
Good For
- Portuguese Text Generation: Ideal for applications requiring high-quality, contextually relevant text generation in European Portuguese.
- Research in Portuguese NLP: Provides a strong baseline for further research and development in Portuguese natural language processing.
- Chatbot Integration: Integrated into the Evaristo.ai chatbot, showcasing its generative capabilities in an interactive environment.