amadeusai/Amadeus-Verbo-FI-Qwen2.5-3B-PT-BR-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Mar 26, 2025License:apache-2.0Architecture:Transformer Open Weights Warm

Amadeus-Verbo-FI-Qwen2.5-3B-PT-BR-Instruct is a 3.09 billion parameter Transformer-based causal language model developed by amadeusai, fine-tuned from Qwen2.5-3B-Instruct. This model is specifically optimized for the Brazilian Portuguese language, having been fine-tuned for two epochs on a 600k instruction dataset. It features a 32,768-token context length and is designed for instruction-following tasks in Portuguese.

Loading preview...

Amadeus-Verbo-FI-Qwen2.5-3B-PT-BR-Instruct Overview

Amadeus-Verbo-FI-Qwen2.5-3B-PT-BR-Instruct is a specialized large language model (LLM) developed by amadeusai, focusing on the Brazilian Portuguese language. It is built upon the robust Qwen2.5-3B-Instruct base model, undergoing a targeted fine-tuning process over two epochs using a substantial 600,000-instruction dataset.

Key Capabilities & Technical Details

  • Language Specialization: Primarily designed and optimized for Brazilian Portuguese (PT-BR).
  • Architecture: Utilizes a Transformer-based architecture incorporating features like RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
  • Parameter Count: Features 3.09 billion parameters, with 2.77 billion non-embedding parameters.
  • Context Length: Supports a significant context window of 32,768 tokens, enabling processing of longer inputs and generating more coherent responses.
  • Fine-tuning: Enhanced for instruction-following tasks through fine-tuning on a large instruction dataset.

Good For

  • Brazilian Portuguese Applications: Ideal for applications requiring high-quality text generation and understanding in Brazilian Portuguese.
  • Instruction Following: Excels at responding to specific instructions and prompts due to its instruction-tuned nature.
  • Research and Development: Suitable for researchers and developers working on PT-BR NLP tasks, offering a strong base model with a substantial context window. Further technical details are available in the associated arXiv paper.