Name: CEIA-UFG/Gemma-3-Gaia-PT-BR-4b-it API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: CEIA-UFG

GAIA: An Open Language Model for Brazilian Portuguese

GAIA (Gemma-3-Gaia-PT-BR-4b-it) is a 4.3 billion parameter language model specifically developed for Brazilian Portuguese. It was created through a collaboration between the Center of Excellence in Artificial Intelligence (CEIA-UFG), The Brazilian Association of AI (ABRIA), Nama, Amadeus AI, and Google DeepMind. The model is based on google/gemma-3-4b-pt and underwent continuous pre-training on an extensive 13 billion token corpus of high-quality Portuguese data, including scientific articles and Wikipedia.

Key Capabilities

Brazilian Portuguese Specialization: Deep understanding and generation of text in Brazilian Portuguese.
Instruction Following: Designed to follow instructions for chat, question answering, and content generation.
Robust Foundation: Serves as a strong base model for fine-tuning on specific Portuguese NLP tasks.

Performance Highlights

GAIA demonstrates competitive performance against the google/gemma-3-4b-it baseline, notably achieving a significant improvement on the ENEM 2024 benchmark (0.7000 vs 0.6556). Its development involved a unique weight merging technique to restore instruction-following capabilities after continuous pre-training, as detailed in the paper "Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs".

Good for

Direct use in chat, summarization, and creative content generation in Portuguese.
Fine-tuning for sentiment analysis, RAG systems, document classification, and specialized chatbots in Portuguese.