Name: TucanoBR/Tucano-1b1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: TucanoBR

Tucano-1b1: A Portuguese-Native Foundational LLM

Tucano-1b1 is a 1.1 billion parameter Transformer-based causal language model developed by TucanoBR. It is part of the Tucano series, which focuses on models natively pretrained in Portuguese. The model was trained on the extensive GigaVerbo dataset, a 200 billion token corpus of deduplicated Portuguese text, ensuring deep linguistic understanding of the language.

Key Capabilities & Features

Native Portuguese Pretraining: Specifically designed and trained for the Portuguese language, unlike many multilingual models.
Foundational Model: Intended for research and development, providing a controlled setting for experiments and a base for fine-tuning.
Causal Language Modeling: Pre-trained via causal language modeling, making it suitable for text generation tasks in Portuguese.
Context Length: Supports a context length of 2048 tokens.

Intended Use Cases

Research & Development: Ideal for academic and industrial research involving Portuguese language modeling.
Comparative Experiments: Useful for studying the effects of active pretraining on benchmarks.
Fine-tuning: Can be adapted and fine-tuned for specific downstream applications in Portuguese, provided users conduct their own risk and bias assessments.

Limitations

It is important to note that Tucano-1b1 is not intended for direct deployment as an out-of-the-box product. It has not been fine-tuned for downstream tasks and is exclusively for the Portuguese language. Like other large language models, it is subject to hallucinations, biases, and may produce unreliable code or repetitive responses.

Overview

Tucano-1b1: A Portuguese-Native Foundational LLM

Key Capabilities & Features

Intended Use Cases

Limitations

Full Model Card (README)